Processing CDATA HTML content in TYPO3L10N xml files

Dear Colleagues,

After processing a TYPO3L10N xml file (TYPE03 localisation/content management system document) in Trados Studio 2022 (latest update) I recognised that the HTML content in CDATA sections were not parsed, so the codes remained in the text parts: br, p, strong, li, ul, further the non-breaking space also stands as a code &nbps. This results in segmentation problems as well, there are some segments containing list items as a sequence of sentences with list symbols in front of each. I'm afraid the output xml files may not keep the structure and format of the source file - even if I struggle with the translation of the additional codes. There are HTML codes only in the CDATA sections.

The Studio automatically assigns the current XML 2 : Any XML file type to this project but it does not manage parsing of the CDATA section.

What I have also tried so far - without any success:

  1. Creating a custom XML 2 file type with separate parsing rules for the HTML codes found in the XML file
  2. Installing the Multilingual XML file type in Studio, but I could not set the languages root and the languages correctly (I think)
  3. I tried to use the XML 2 : MadCap file type enabling the Processing of embedded content Inside CDATA element with HTML embedded content 5 2.0.0.0. I entered to the Detection / Root element name the right root element name rule TYPO3L10N - which worked well as detection rule with the custom XML 2 file type

Now I sit helplessly, how to proceed, as I tried all solutions I could configure.

Can you please guide me, how to manage such a TYPO3L10N xml file?

I share the initial section of the XML file.

Regards

Attila

emoji
Parents Reply Children
No Data