How to retain UTF-8 encoding in target html files in Studio 2015?

Dear colleagues,

This is a question my colleague who uses Studio 2015 has, and I am curious to know the answer, so here it goes:

My colleague was working on a into Simplified Chinese project that contains dozens of html files.  When I open one translated Studio file she sent me in Notepad++, the first line has "UTF-8" encoding, which is the encoding the client wants the target file to be in.  After the file was exported to target translation using Generate Target Translations, the target Chinese file's encoding was changed to GB2312.  As a result, my colleague had to open each target file in Notepad++, convert it to UTF-8 and resave the file.

Is there a way to set up these html files in Studio 2015 so that the UTF-8 encoding will be retained in the target files?  UTF-8 encoding works just as well for Chinese files.

Thank you in advance for your advice!

 

Chunyi

Parents Reply Children
  •   Hi Evzen,

    As I mentioned in my post, this is an issue my colleague has.  I rarely handle html files in my work, so please forgive me if I didn't phrase my question correctly:)

    Here are two screenshots that might explain the encoding better.  I got these screenshots off from Notepad++.  The bilingual Studio file has utf-8 in the first line and at bottom right.  The target file (using generate target translations under batch tasks) has GB2312 (Simplified).  

    I am basically looking for ways to tell Studio not to change the encoding to GB2312 when it generates the target files.  Mindy's method is one possible solution, but if there is another way to prevent the issue, I would love to know it!

     

    Chunyi

  • Well, what to say...
    As I presumed, looking at your screenshots and the (basically incorrect) conclusions you are making from what you think it means, you are simply mixing apples and oranges...

    The "utf-8" string in the XML declaration element means absolutely nothing in this context... Plus, checking the SDLXLIFF internal encoding is simply complete nonsense in this context.

    Unknown said:
    I am basically looking for ways to tell Studio not to change the encoding to GB2312 when it generates the target files.  Mindy's method is one possible solution, but if there is another way to prevent the issue, I would love to know it!

    There is pretty-well-hidden - and I actually believe that it's probably also somehow forgotten by the developers themselves - Active Document Settings dialog in Editor - go to the Advanced tab and you will find it in the File Actions ribbon group.
    If you virtually merge more files, you can set the encodings for all of them. But unfortunately not for all at once, but only one-by-one in a terribly small and non-resizable control (I wish the person who designed this has to do this for 100 files at least twice a day... to taste personally the design's stupidity :-\).

    So, the only sensible solution right now is to prepare the source files appropriately before putting them to Studio, i.e. to make sure that they have the UTF-8 BOM.

    And we can only hope that future Studio versions will finally implement smarter encoding processing, e.g. like was discussed here (page 2-3): community.sdl.com/.../14067