Character entities in CDATA elements

Hey everyone,

I'm struggling to convert character entities in the CDATA section of an XML file. The content is in German, so there are umlauts all over the place...

The source file is inconsistent in the way it annotates the entities: sometimes it uses the numerical annotations, sometimes the alphanumerical.

I know character entities are not needed in CDATA elements, but they're there nonetheless...

Here's a sample of the XML file:

<ARTICLE>
   <MAIN>
      <TI>Ursachen erkennen und beheben: Tr&#228;nkenh&ouml;he nicht angepasst</TI>
      <TE>
         <![CDATA[<p><strong>T&#228;gliche Anpassung der Tr&auml;nkenh&ouml;he ist n&ouml;tig!</strong></p>&#10;]]>
      </TE>
   </MAIN>
</ARTICLE>

The entities in the TI element are converted correctly. The ones in the CDATA element are not.

Screenshot of Trados Studio showing German text with character entities in the CDATA section not converted correctly.

Is there any way Studio can convert them?



Generated Image Alt-Text
[edited by: Trados AI at 10:22 PM (GMT 0) on 28 Feb 2024]
emoji
Parents Reply
  • Hi Ken, 

    Working fine using both XML v1 and XML v2, Studio recognize both HTML entity (&auml;) and the entity in Decimal format (&#228;).

    Try using a HTML 5.2 embedded content procesor for that element or elements for CData section.

    Trados Studio project file type settings showing CDATA section rules highlighted in red.

    Preview of embedded content processing in Trados Studio with German text showing no visible errors.

    An in the HTML content processor set the neccesary entities,

    Entity conversion settings in Trados Studio with HTML 5 selected and entity mapping list visible.

    If you need the entities to be written back in Decimal format (&#228;)you may have a problem. I don't think Studio can do it. I need that format for some of my projects and I am converting the entities using a python script after exporting the files from Studio.

    I hope it helps, 

    Felipe

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 10:22 PM (GMT 0) on 28 Feb 2024]
Children