Unicode character in UTF-8 (with BOM) encoded TXT disappears in Trados 2014

Hi everyone, 

I am new to the forum, and I hope you are able to help me with a technical issue.

I just loaded a .txt file in Trados like the one in the screenshot below (as visualized with Notepad++):

 

 

Unfortunately Trados doesn't read the black US character and the text appears as follow:

~GANTT_COLOR_DELIMITER_DAYdescription\~
~GANTT_COLOR_DELIMITER_DAYdisplayname\~
~GANTT_COLOR_DELIMITER_HOURdescription\~
~GANTT_COLOR_DELIMITER_HOURdisplayname\~

This creates problems when I want to save the target file (which I am saving as .txt with UTF-8 encoding), as the US character appears to be gone, also when I open the file with Notepad ++.

Is there something that I should do in order to visualize the character correctly in Trados and not loosing it in the target file?

I would appreciate very much any hint or help.

Best,

Annalisa

Parents
  • Hi ,

    I have looked at this file with the development team as I had the same problem as you and could not make this work with Studio 2017 either. It's an interesting problem which was explained to me as follows.

    The FileTypeSupport.Framework removes any character which is NOT one of the following: (Regex) "[^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u10000-\u10FFFF]"

    The reasons for this is that any character which is not described in the above ranges is INVALID in XML. Thus, they are filtered out by the Framework as we use XML for the SDLXLIFF and the Translation Memory content. Investigations we have carried out in the past have shown that the Translation Memory breaks in Studio when certain UTF-16 unicode surrogate pairs are allowed (such as the ones you have in your file).

    So for the time being my suggestion would be to search and replace these characters with something recognisable that you can replace later. So you could use a tab character for example, or maybe invent a unique tag like this perhaps:

    <US>

    If they are brought into the editor and not converted to structure anyway (as your example does not look like translatable text to me) then you could convert the <US> into a studio tag to make them easier to handle and then they'll be easy to find in the target file later on.

    Maybe someone has a better idea to resolve this but for now the important thing is to note that you can only deal with this using a workaround.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Reply
  • Hi ,

    I have looked at this file with the development team as I had the same problem as you and could not make this work with Studio 2017 either. It's an interesting problem which was explained to me as follows.

    The FileTypeSupport.Framework removes any character which is NOT one of the following: (Regex) "[^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u10000-\u10FFFF]"

    The reasons for this is that any character which is not described in the above ranges is INVALID in XML. Thus, they are filtered out by the Framework as we use XML for the SDLXLIFF and the Translation Memory content. Investigations we have carried out in the past have shown that the Translation Memory breaks in Studio when certain UTF-16 unicode surrogate pairs are allowed (such as the ones you have in your file).

    So for the time being my suggestion would be to search and replace these characters with something recognisable that you can replace later. So you could use a tab character for example, or maybe invent a unique tag like this perhaps:

    <US>

    If they are brought into the editor and not converted to structure anyway (as your example does not look like translatable text to me) then you could convert the <US> into a studio tag to make them easier to handle and then they'll be easy to find in the target file later on.

    Maybe someone has a better idea to resolve this but for now the important thing is to note that you can only deal with this using a workaround.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Children
No Data