XML 2 custom filetype: everything in bold, characters with diacritical marks not correctly processed

Hello, community,

at my firm, we get XML files like the one attached for translation. For these files, I want to create a custom XML 2 filetype so we can see the RecordID attribute value in the DSI section in Studio. I managed to do this, but the filetype still has two problems:

  1. it shows everything in bold
  2. it does not correctly process characters with diacritical marks (even though entity conversion is enabled)

This behaviour changes when content processors are enabled:

  • Preview screenshot when NO content processor is enabled OR when it is enabled with the parser rule "//entry" OR when it is enabled defined by document structure information "sdl:cdata" (dsi information is shown, but everything is bold and characters with diacritical marks are not displayed correctly):

Trados Studio preview showing XML content with RecordID attribute value in DSI section, text in bold and diacritical marks not displayed correctly.

  • Preview screenshot when plain text processor is enabled inside CDATA element (text displayed correctly (not bold, characters with diacritical marks okay), but no dsi whatsoever):

Trados Studio preview with plain text processor enabled, showing XML content correctly without bold text and proper diacritical marks, but missing DSI information.

What I'd like to have is the file being displayed as in the second screenshot, but with the structure information as in the first screenshot. Any help would be appreciated.

Thanks in advance. Slight smile

Best regards,
Katharina
Studio version: Trados Studio 2022 - 17.0.6.14902

0245.Test.zip



Generated Image Alt-Text
[edited by: Trados AI at 1:28 PM (GMT 0) on 29 Feb 2024]
emoji
Parents
  •  

    I did provide a sample in the .zip folder I attached to my post.

    So you did... couldn't see it for looking!

    Quite an interesting file and I went around the houses before we found the problem!  I'll start with the problem... the file you are using has no XML declaration and it doesn't have a BOM (Byte Order Mark telling it the file is UTF-8).  So when you open in Studio the file is actually seen as Baltic (Windows) and the incorrect encoding is used.  The result... nothing you do will fix this problem of entities unless you fix in the file or change the encoding when you open it.

    The last option is only available for single file projects and works by selecting UTF-8 when you open the file here:

    Screenshot shoing the selection of the encoding from a drop down when opening the file.

    You will then see this message:

    Screenshot warning of the change from Baltic (Windows) endoding.

    Click yes and all is well :-)

    Screenshot of Trados Studio showing a translation error message in German, indicating a problem with the quantity selection for an item in the shopping cart.

    We cannot change the bold I'm afraid as this is due to the text being in a CDATA section.  It's only bold in the editor.

    The next way to change it, seeing as you may have many files and working with a single file project is probably not optimal, is to pre-process the file and add an xml declaration to the start:

    <?xml version="1.0" encoding="UTF-8"?>

    Easy enough to do and you could create a script to handle all the file in a project and then use a script to remove the declaration afterwards if you client cannot manage the files with this.

    Another way is to add a BOM to your files.  This may be even more preferable since you can handle multiple files in one go with this app and the change shouldn't be an issue for your customer:

    https://appstore.rws.com/Plugin/58

    You can find a short video to see how this works here: https://youtu.be/mJY212049WE

    In the process of all of this I also tried to use the preview to workaround the issue and can do this:

    Screenshot showing the XSLT custom preview

    It didn't help me with the workaround until I fixed the problem with the encoding, but I quite like it so here's the stylesheet in case you find it helpful too: recordID.zip

    So I think this is solved apart from the bold bit, but I'm afraid there is no way around this since you cannot use the Document Structure for your rule, nor for the preview, if you use embedded content.  But at least you seem to have a way forward.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 1:28 PM (GMT 0) on 29 Feb 2024]
Reply
  •  

    I did provide a sample in the .zip folder I attached to my post.

    So you did... couldn't see it for looking!

    Quite an interesting file and I went around the houses before we found the problem!  I'll start with the problem... the file you are using has no XML declaration and it doesn't have a BOM (Byte Order Mark telling it the file is UTF-8).  So when you open in Studio the file is actually seen as Baltic (Windows) and the incorrect encoding is used.  The result... nothing you do will fix this problem of entities unless you fix in the file or change the encoding when you open it.

    The last option is only available for single file projects and works by selecting UTF-8 when you open the file here:

    Screenshot shoing the selection of the encoding from a drop down when opening the file.

    You will then see this message:

    Screenshot warning of the change from Baltic (Windows) endoding.

    Click yes and all is well :-)

    Screenshot of Trados Studio showing a translation error message in German, indicating a problem with the quantity selection for an item in the shopping cart.

    We cannot change the bold I'm afraid as this is due to the text being in a CDATA section.  It's only bold in the editor.

    The next way to change it, seeing as you may have many files and working with a single file project is probably not optimal, is to pre-process the file and add an xml declaration to the start:

    <?xml version="1.0" encoding="UTF-8"?>

    Easy enough to do and you could create a script to handle all the file in a project and then use a script to remove the declaration afterwards if you client cannot manage the files with this.

    Another way is to add a BOM to your files.  This may be even more preferable since you can handle multiple files in one go with this app and the change shouldn't be an issue for your customer:

    https://appstore.rws.com/Plugin/58

    You can find a short video to see how this works here: https://youtu.be/mJY212049WE

    In the process of all of this I also tried to use the preview to workaround the issue and can do this:

    Screenshot showing the XSLT custom preview

    It didn't help me with the workaround until I fixed the problem with the encoding, but I quite like it so here's the stylesheet in case you find it helpful too: recordID.zip

    So I think this is solved apart from the bold bit, but I'm afraid there is no way around this since you cannot use the Document Structure for your rule, nor for the preview, if you use embedded content.  But at least you seem to have a way forward.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 1:28 PM (GMT 0) on 29 Feb 2024]
Children