Character entities in CDATA elements

Hey everyone,

I'm struggling to convert character entities in the CDATA section of an XML file. The content is in German, so there are umlauts all over the place...

The source file is inconsistent in the way it annotates the entities: sometimes it uses the numerical annotations, sometimes the alphanumerical.

I know character entities are not needed in CDATA elements, but they're there nonetheless...

Here's a sample of the XML file:

<ARTICLE>
   <MAIN>
      <TI>Ursachen erkennen und beheben: Tr&#228;nkenh&ouml;he nicht angepasst</TI>
      <TE>
         <![CDATA[<p><strong>T&#228;gliche Anpassung der Tr&auml;nkenh&ouml;he ist n&ouml;tig!</strong></p>&#10;]]>
      </TE>
   </MAIN>
</ARTICLE>

The entities in the TI element are converted correctly. The ones in the CDATA element are not.

Screenshot of Trados Studio showing German text with character entities in the CDATA section not converted correctly.

Is there any way Studio can convert them?



Generated Image Alt-Text
[edited by: Trados AI at 10:22 PM (GMT 0) on 28 Feb 2024]
emoji
Parents
  • The file was created in a custom CMS system. I'm guessing the system is a bit flawed...

    Converting into tags is not going to work, because the entities represent characters (mostly ä, ü, and ö). I could do that via RegExes in the Tag definition rules of the Embedded content processor. However, they're part of words that need be translated. If I convert them into tags, I'll end up with error messages about missing tags.

    I'll get in touch with the client and try to convince him to clean it up at the source...

    Thanks for the info! Knowing that it's not possible is already a great help! (c:

Reply
  • The file was created in a custom CMS system. I'm guessing the system is a bit flawed...

    Converting into tags is not going to work, because the entities represent characters (mostly ä, ü, and ö). I could do that via RegExes in the Tag definition rules of the Embedded content processor. However, they're part of words that need be translated. If I convert them into tags, I'll end up with error messages about missing tags.

    I'll get in touch with the client and try to convince him to clean it up at the source...

    Thanks for the info! Knowing that it's not possible is already a great help! (c:

Children