XML Parsing - ␍ Charater (Carriage Return)

Hi, 

I've got an XML file containing the carriage return character "␍" in it, and my Trados Studio 2024 parses the character as translatable text. Is there a way to parse it as an actual line break, carriage return, etc.?

This is what I usually see in the editor in Trados Studio:

CarriageReturn character in xml file

I've tried adding a segmentation rule to the translation memory. Now the text is segmented after the character (so far, so good), but it still parses the character as translatable text:

CarriageReturn in XML file plus segmentation

I'va had a look at some threads in the forum but couldn't solve my issue. Any suggestions would be greatly appreciated!

sample.xml.zip



Added sample file
[edited by: 211127 at 1:20 PM (GMT 1) on 7 May 2025]
emoji
Parents
  •  

    I've got an XML file containing the carriage return character "␍" in it, and my Trados Studio 2024 parses the character as translatable text.

    The problem is clear when you look at your file with a hexadecimal editor:

    Hexadecimal editor showing bytes E2 90 8D highlighted, representing the Unicode character U+240D, with text 'Lorem ipsum dolor sit amet' and other characters.

    The bytes E2 90 8D represent the Unicode character U+240D, which is:

    Symbol for Carriage Return

    This is not the actual carriage return character \r (which is 0D in hex).  Rather:

    • 0D = real carriage return, an ASCII control character.

    • E2 90 8D = Unicode graphic symbol that represents a carriage return in visual form.

    So the easiest solution here would be to create a placeable embedded content rule set as exclude to achieve this:

    Preview of sample.xml file with two columns of text, each containing Lorem Ipsum placeholder text, displayed in a structured format.

    To achieve it I did these things:

    1. added some context to the VALUE parser rule:
      Options window in Trados Studio showing parser rules for XML filetype, with a highlighted rule for 'VALUE' and context set to 'Paragraph'.

    2. Activated the embedded content rule and defined by document structure:
      Embedded content processing settings in Trados Studio, showing options for defining parser rules and document structure information with arrows pointing to key settings.

    3. Added a placeholder rule with the cr character:
      Create regex rules window for embedded content in Trados Studio, showing a placeholder tag type with a regular expression and translation set to 'Not translatable'.

    Like this I didn't need any segmentation rules on the TM, only on the custom XML filetype I created to achieve this.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 3:14 PM (GMT 1) on 7 May 2025]
Reply
  •  

    I've got an XML file containing the carriage return character "␍" in it, and my Trados Studio 2024 parses the character as translatable text.

    The problem is clear when you look at your file with a hexadecimal editor:

    Hexadecimal editor showing bytes E2 90 8D highlighted, representing the Unicode character U+240D, with text 'Lorem ipsum dolor sit amet' and other characters.

    The bytes E2 90 8D represent the Unicode character U+240D, which is:

    Symbol for Carriage Return

    This is not the actual carriage return character \r (which is 0D in hex).  Rather:

    • 0D = real carriage return, an ASCII control character.

    • E2 90 8D = Unicode graphic symbol that represents a carriage return in visual form.

    So the easiest solution here would be to create a placeable embedded content rule set as exclude to achieve this:

    Preview of sample.xml file with two columns of text, each containing Lorem Ipsum placeholder text, displayed in a structured format.

    To achieve it I did these things:

    1. added some context to the VALUE parser rule:
      Options window in Trados Studio showing parser rules for XML filetype, with a highlighted rule for 'VALUE' and context set to 'Paragraph'.

    2. Activated the embedded content rule and defined by document structure:
      Embedded content processing settings in Trados Studio, showing options for defining parser rules and document structure information with arrows pointing to key settings.

    3. Added a placeholder rule with the cr character:
      Create regex rules window for embedded content in Trados Studio, showing a placeholder tag type with a regular expression and translation set to 'Not translatable'.

    Like this I didn't need any segmentation rules on the TM, only on the custom XML filetype I created to achieve this.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 3:14 PM (GMT 1) on 7 May 2025]
Children