UTF-8 without BOM

Hello everybody! 

Does anyone found a way to save generate xliff target files UTF-8 without BOM?

Is there any other way to solve this encoding problem other than to manually open the final xliff 2.0 file in Notepad and changing it? 

I have read about this issue is known to SDL at least since 2005. At the moment I am using 2021, but it never really caused problems to me. Until now. J

I will work on 200 files in the next few days, I am so disappointed with Studio regarding this, and the length issue, that generates an error too. 

I will be very happy to learn if anyone found a solution to this issue. ;) 

Warm regards, Neila



:)
[edited by: neila carneiro at 4:31 PM (GMT 1) on 29 Jun 2022]
emoji
Parents
  • I have read about this issue is known to SDL at least since 2005. 

    That would put it before the development of Trados Studio, so a long standing problem indeed.  I think the most common solution to these problems is to add a BOM to the files before preparing them for translation.  This is fairly trivial to do and normally avoids the issue of there being an incorrect encoding in the target file.  XLIFF files are normally assumed to be UTF-8 if the encoding is missing, but I guess you don't really mean XLIFF files as the file in question in your screenshot was an XLSX.  So you are probably referring to changing the encoding in an SDLXLIFF which I have never had to do... as far as I can recall.

    I can't recall having this problem myself with Excel files either, and your language pair doesn't look problematic. Importing into Excel can be tricky as Excel isn't that great at handling some languages, but perhaps you can share a source file, or at least a sample of one, so we can test?

    emoji
  • Dear Paul,

    Many thanks for your prompt reply!

    Maybe in the post from 2005 on Proz.com, they meant Trados only, before it was Trados Studio. Back then, it was Trados Benchmark, I think. And SDLX. I worked with both. 

    So sorry if I didn't elaborate it better, I read so many things today about this, I can't really pinpoint the thread. I am sure it was a thread from 2005 on Proz about encoding and Trados. 

    That being said, the problem is:

    I will deal with over 125 xliff files, UTF-8 (without BOM) and Studio adds the BOM.

    Here is an example of the source xliff file: 

    And a screenshot with  xliff files in package... 

    Target: 

    I had to manually change 28 files, since I could not find a way to do it in batches. 

    emoji
  • ok - so it is XLIFF files.  I guess part of your problem is there is no declaration in the XIFF ad the correct behaviour according to the XLIFF specification is this:

    http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#Specs_XMLDecl

    So without a declaration it's assumed UTF-8 and this probably means adding a BOM since this is the only way to tell a parser to treat the file as UTF-8.  Why is the BOM a problem?

    I had to manually change 28 files, since I could not find a way to do it in batches. 

    Many tools can do this for you, but a very easy way would be to use the File Encoding Converter which we have on the AppStore.  Unfortunately there is no access to the appstore at the moment due to a problem but we hope to have a solution in place by the end of the week and will make sure this app is available.

    emoji
  • Many thanks again for your explanation, article, and app suggestion.

    All I know is that these xliff files were exported from Pendo. 

    I have asked the team to check if the BOM will create problems or not for this project.  

    You gave me a better answer than Google today, and that is rare. :-)

    emoji
Reply Children