Problems reconverting XML files with embedded HTML code

 Hi everybody

After searching for a solution on my own and then in the community, and having not found what I need, I am asking if somebody could help.

I have an XML file with embedded HTML code. I managed to create a filetype, that was the easy part. The file in Studio looks great. There are some links with http... and some segments beginning with { that I can easely block in Studio:
Comparison of XML file before and after finalization in Trados Studio, highlighting discrepancies in HTML entity conversion.

The problem happens when I finalize the file. Some HTML entities are correctly converted, others are not. I tried different options in the filetype (both in the XML and in the embedded HTML configuration), but could not find a solution. The only thing I can do is to open the file in an Editor and find&replace the characters by hand (with a macro). I have also a bunch of tabs appearing from time to time...

In this screenshot I compared the file before and after (original file is on the right, the file on the left is created by Studio after finalization):

Side-by-side view of XML code with highlighted differences, showing incorrect HTML entity conversion after processing in Trados Studio.

I already tried to use a converter in my editor, the problem is that it converts all entities. Some of them must remain the same.

Example:

This part created by Studio

<en><script>window.product_id = 'VC-WPI'; window.dataLayer = window.dataLayer || []</script></en>

should be

<en>&lt;script&gt;window.product_id = 'VC-WPI'; window.dataLayer = window.dataLayer || []&lt;/script&gt;</en>

If I convert the file in the editor I get

<en><script>window.product_id = 'VC-WPI'; window.dataLayer = window.dataLayer || []</script></en>

I sent the last one to my customer, but he will not accept it, it must be the yellow marked line.

I attached my filetype settings.

Could somebody help me?

Thank you very much.

Angelo

 

Test setting XML embedded HTML.rar



Generated Image Alt-Text
[edited by: Trados AI at 3:32 PM (GMT 0) on 28 Feb 2024]
emoji
Parents Reply Children
  • Unknown said:
    Entity conversion is turned off in the settings file Angelo posted.

    Yeah, I started writing my answer before the new settings were posted, but was disrupted by some other work and finished it only later when the new files were already posted, but I didn't know that ;-)

    Unknown said:
    Also, entity conversion would never convert the element tags.

    OK, maybe not in this case, it depends on how exactly is the HTML embedded and what and how is/gets escaped when coming into the parser and getting out of the parser... I had to play with these "extra settings" of both the embedded HTML and the "parent" XML parser many times in various ways in order to get the desired result, depending on how weirdly formatted the original file was and what exactly was to be translated.