Processing CDATA HTML content in TYPO3L10N xml files

Dear Colleagues,

After processing a TYPO3L10N xml file (TYPE03 localisation/content management system document) in Trados Studio 2022 (latest update) I recognised that the HTML content in CDATA sections were not parsed, so the codes remained in the text parts: br, p, strong, li, ul, further the non-breaking space also stands as a code &nbps. This results in segmentation problems as well, there are some segments containing list items as a sequence of sentences with list symbols in front of each. I'm afraid the output xml files may not keep the structure and format of the source file - even if I struggle with the translation of the additional codes. There are HTML codes only in the CDATA sections.

The Studio automatically assigns the current XML 2 : Any XML file type to this project but it does not manage parsing of the CDATA section.

What I have also tried so far - without any success:

  1. Creating a custom XML 2 file type with separate parsing rules for the HTML codes found in the XML file
  2. Installing the Multilingual XML file type in Studio, but I could not set the languages root and the languages correctly (I think)
  3. I tried to use the XML 2 : MadCap file type enabling the Processing of embedded content Inside CDATA element with HTML embedded content 5 2.0.0.0. I entered to the Detection / Root element name the right root element name rule TYPO3L10N - which worked well as detection rule with the custom XML 2 file type

Now I sit helplessly, how to proceed, as I tried all solutions I could configure.

Can you please guide me, how to manage such a TYPO3L10N xml file?

I share the initial section of the XML file.

Regards

Attila

emoji
Parents
  • Thank you for the info, Brian,

    SOLUTION FOUND!!!

    While creating the XML 2 : Any XML custom file type in Studio, somehow the option "Embedded content processing" containing the CDATA section handling did not appear, for which I am not sure the reason. BUT: I had a look at the newly created custom XML 2 file type as a tree structure I happily recognised that there was the option "Embedded content processing". So I only had to activate the CDATA section parsing and in the Entities option to enable the HTML special symbols handling.

    The preview at the bottom of the options page was a great help for me, as after browsing for my working source XML file there I immediately could see the updated options for CDATA parsing working well. Thereafter I made a new project, wherein the source XML file was already processed through the new correctly configured new custom XML 2 file type. And of course the opened sdlxliff document contained every document section as it should, so everything looks fine now. 

    emoji
Reply
  • Thank you for the info, Brian,

    SOLUTION FOUND!!!

    While creating the XML 2 : Any XML custom file type in Studio, somehow the option "Embedded content processing" containing the CDATA section handling did not appear, for which I am not sure the reason. BUT: I had a look at the newly created custom XML 2 file type as a tree structure I happily recognised that there was the option "Embedded content processing". So I only had to activate the CDATA section parsing and in the Entities option to enable the HTML special symbols handling.

    The preview at the bottom of the options page was a great help for me, as after browsing for my working source XML file there I immediately could see the updated options for CDATA parsing working well. Thereafter I made a new project, wherein the source XML file was already processed through the new correctly configured new custom XML 2 file type. And of course the opened sdlxliff document contained every document section as it should, so everything looks fine now. 

    emoji
Children