UTF-8 without BOM

Hello everybody! 

Does anyone found a way to save generate xliff target files UTF-8 without BOM?

Is there any other way to solve this encoding problem other than to manually open the final xliff 2.0 file in Notepad and changing it? 

I have read about this issue is known to SDL at least since 2005. At the moment I am using 2021, but it never really caused problems to me. Until now. J

I will work on 200 files in the next few days, I am so disappointed with Studio regarding this, and the length issue, that generates an error too. 

Trados Studio screenshot showing Active Document Settings with Source Language set to English (United States) and Target Language set to Portuguese (Brazil). An encoding error is visible for file LOC-1923_HJ_PB_RU_ZH_toTrans.xlsx with source encoding na and target encoding options listed.

I will be very happy to learn if anyone found a solution to this issue. ;) 

Warm regards, Neila



Generated Image Alt-Text
[edited by: Trados AI at 7:17 AM (GMT 0) on 29 Feb 2024]
emoji
Parents
  • I have read about this issue is known to SDL at least since 2005. 

    That would put it before the development of Trados Studio, so a long standing problem indeed.  I think the most common solution to these problems is to add a BOM to the files before preparing them for translation.  This is fairly trivial to do and normally avoids the issue of there being an incorrect encoding in the target file.  XLIFF files are normally assumed to be UTF-8 if the encoding is missing, but I guess you don't really mean XLIFF files as the file in question in your screenshot was an XLSX.  So you are probably referring to changing the encoding in an SDLXLIFF which I have never had to do... as far as I can recall.

    I can't recall having this problem myself with Excel files either, and your language pair doesn't look problematic. Importing into Excel can be tricky as Excel isn't that great at handling some languages, but perhaps you can share a source file, or at least a sample of one, so we can test?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Dear Paul,

    Many thanks for your prompt reply!

    Maybe in the post from 2005 on Proz.com, they meant Trados only, before it was Trados Studio. Back then, it was Trados Benchmark, I think. And SDLX. I worked with both. 

    So sorry if I didn't elaborate it better, I read so many things today about this, I can't really pinpoint the thread. I am sure it was a thread from 2005 on Proz about encoding and Trados. 

    That being said, the problem is:

    I will deal with over 125 xliff files, UTF-8 (without BOM) and Studio adds the BOM.

    Here is an example of the source xliff file: 

    Notepad screenshot showing an xliff file with version 1.2, source language English (US), no target language specified, and UTF-8 encoding without BOM.

    And a screenshot with  xliff files in package... 

    Trados Studio screenshot displaying document languages and encodings, with source language English (US), target language Portuguese (Brazil), and various files with different source and target encodings.

    Target: 

    Notepad screenshot showing an xliff file with version 1.2, source language English (US), target language Portuguese (Brazil), and UTF-8 encoding with BOM.

    I had to manually change 28 files, since I could not find a way to do it in batches. 

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 7:17 AM (GMT 0) on 29 Feb 2024]
  • ok - so it is XLIFF files.  I guess part of your problem is there is no declaration in the XIFF ad the correct behaviour according to the XLIFF specification is this:

    http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#Specs_XMLDecl

    So without a declaration it's assumed UTF-8 and this probably means adding a BOM since this is the only way to tell a parser to treat the file as UTF-8.  Why is the BOM a problem?

    I had to manually change 28 files, since I could not find a way to do it in batches. 

    Many tools can do this for you, but a very easy way would be to use the File Encoding Converter which we have on the AppStore.  Unfortunately there is no access to the appstore at the moment due to a problem but we hope to have a solution in place by the end of the week and will make sure this app is available.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Many thanks again for your explanation, article, and app suggestion.

    All I know is that these xliff files were exported from Pendo. 

    I have asked the team to check if the BOM will create problems or not for this project.  

    You gave me a better answer than Google today, and that is rare. :-)

    emoji
  • Dear Paul, 

    Many thanks in advance again. 

    Pardon my ignorance, ¯\_(ツ)_/¯ but... 

    1 - The source file shows UTF-8 without BOM in Notepad, but there is no option to generate a UTF-8 without BOM in Sdl Studio?

    Close-up of Trados Studio interface showing UTF-8 encoding option with Windows (CRLF) line break style.

    2 - If the file type is text, I can choose to keep BOM or not... 

    Trados Studio project settings window with options for line breaks and Unicode UTF-8 byte order mark (BOM) settings.

    I read the declaration part, and I understood that this should be the file declaration or not:  

    <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2"><file original="0K8TlYX0gAfQRYULHMY6Qly2Yas" datatype="pendoguide" source-language="en-US" target-language=""><body><group id="Vu_KttQblCOOPe6qbyG6_IKUwjI"><note></note><trans-unit id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|ariaLabel">

    So, if Pendo export the file and does not add something (pendoquide is not valid, I imagine) or something else, Studio will add BOM to the UFT file. 

    I really appreciate any tip you can give me. Slight smile

    Warm regards, Neila  :-) 

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 7:17 AM (GMT 0) on 29 Feb 2024]
Reply
  • Dear Paul, 

    Many thanks in advance again. 

    Pardon my ignorance, ¯\_(ツ)_/¯ but... 

    1 - The source file shows UTF-8 without BOM in Notepad, but there is no option to generate a UTF-8 without BOM in Sdl Studio?

    Close-up of Trados Studio interface showing UTF-8 encoding option with Windows (CRLF) line break style.

    2 - If the file type is text, I can choose to keep BOM or not... 

    Trados Studio project settings window with options for line breaks and Unicode UTF-8 byte order mark (BOM) settings.

    I read the declaration part, and I understood that this should be the file declaration or not:  

    <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2"><file original="0K8TlYX0gAfQRYULHMY6Qly2Yas" datatype="pendoguide" source-language="en-US" target-language=""><body><group id="Vu_KttQblCOOPe6qbyG6_IKUwjI"><note></note><trans-unit id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|ariaLabel">

    So, if Pendo export the file and does not add something (pendoquide is not valid, I imagine) or something else, Studio will add BOM to the UFT file. 

    I really appreciate any tip you can give me. Slight smile

    Warm regards, Neila  :-) 

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 7:17 AM (GMT 0) on 29 Feb 2024]
Children