XML file processing in Trados Studio 2024 - include Text after <a> tags and href= parts

Dear Colleagues,

Can somebody provide me with an urgent help, how I can solve the following issue?

I have an XML file and in the ![CDATA section there are lines like these:

<ul><li><a href="t3://page?uid=5758">Thread</a></li><li><a href="t3://page?uid=1364">Riveting</a></li><li><a

and Trados Studio does not include the text between <a> tags, after href=.

I have set in the project setting, custom XML file type, Embedded content processing: Process embedded content - Inside CDATA element - with HTML embedded content 5 2.0.0.0 and

 in HTML 5 file type the parser rule to process "a" tags like: attributes: href = True, Always translatable, Segmentation hint: Exclude and I did the same settings in File type - Embedded content processor - HTML 5 section

but I could not achieve that the above mentioned texts are offered for translation within the XML documents.

Which settings shall I correct to have those texts translated?

Thanks a lot in advance.

Attila

emoji
Parents
  • "Probability score": 0.7

    "Question improvement suggestion": Please provide more information about the exact version of Trados Studio you are using and the operating system. Also, it would be helpful if you could share any error messages you might be encountering, and any recent changes you made to the system or software.

    In the meantime, based on the information you have provided, here is a suggested answer to your question:

    Hello Attila,

    Based on your description, it seems like you're having trouble getting certain parts of your XML file to be recognized for translation in Trados Studio. Here are some steps you can try:

    Step 1: Open your project in Trados Studio.

    Step 2: Go to 'Project Settings' and select 'File Types'.

    Step 3: Find and select 'XML' from the list.

    Step 4: In the 'Parser Rules' section, ensure that the 'a' tag is set to 'Always translatable'.

    Step 5: In the 'Attributes' section, ensure that 'href' is set to 'True'.

    Step 6: In the 'Embedded Content Processing' section, ensure that 'Process embedded content' is selected and that 'Inside CDATA element' with 'HTML embedded content 5 2.0.0.0' is specified.

    Step 7: Save your changes and try to process the XML file again.

    If these settings are already in place and you're still having trouble, it might be an issue with the specific XML file you're working with. You might need to manually adjust the XML file to ensure that the 'a' tags and 'href' attributes are properly formatted for translation.

    I hope this answer addressed your question. If not, please reject the answer, add the question improvement suggestion information, and click the RWS Community AI checkbox on the right to try again.

    RWS Community AI

    emoji
  •  

    I rejected the AI answer because it's half right and half wrong ;-)

    Presumably you have created a custom filetype for this, and enabled the embedded content processioning to use the HTML processor?  That should get you this:

    Trados Studio preview window showing a side-by-side comparison of the source and target text for a file named forgotten_links.xml. The text includes a story with links labeled Thread and Riveting.

    For a dummy file like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <story>
      <title>The Forgotten Links</title>
      <author>ChatGPT</author>
      <content>
        <![CDATA[
        <p>Late one evening, as the digital winds howled through the archives of the ancient content management system, two forgotten links stirred from their slumber.</p>
        <ul>
          <li><a href="t3://page?uid=5758">Thread</a></li>
          <li><a href="t3://page?uid=1364">Riveting</a></li>
        </ul>
        <p>Thread, a link once woven into countless forum posts, yearned to reconnect conversations long severed. Riveting, a gateway to stories that once held readers in thrall, glowed faintly with the promise of rediscovery.</p>
        <p>Together, they embarked on a journey through deprecated plugins, malformed tags, and layers of nested divs, hoping to find the legendary page that bound all fragments of the old web together.</p>
        ]]>
      </content>
    </story>
    

    And settings like this:

    Trados Studio options window displaying the Embedded Content Processing settings. The option 'Process embedded content' is checked, and 'Inside CDATA element with: Html Embedded Content 5.2.0.0.0' is selected.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub



    Generated Image Alt-Text
    [edited by: RWS Community AI at 1:56 PM (GMT 1) on 30 Apr 2025]
  • Hi Paul, thank you for the answer. I have made the settings you detailed here but unfortunately I have not experienced the results you have. So I do not see the blue underlined expressions. Can a reason for this be that there is no rule defined in the File type / Parser rules for "t3://page..."?

    Moreover I have made the following settings for the tag <a> in the Embedded processors / HTML 5 configuration / Parser rules:

    Attributes: href=true

    Properties: Translate: Always translatable

    Tag type: Inline

    Segmentation hint: Include with text (it does not work with Exclude either)

    The t3 texts are shown among the segments like "t3://page?uid=1364" but not the texts after that.

    emoji
  •  

    I doubt you have done what I showed you.  You do not need parser rules for the html at all.  Please show us what your filetype settings look like, parser rules, embedded content rules... and maybe if you can just extract a complete XML sample that contains one or two examples of what these parts look like in your actually file?  We don't need the full file.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi Paul,

    Here I am showing my project settings: the HTML 5 embedded content processor settings and the specific XML file settings.

    Trados Studio project settings showing HTML 5 embedded content processor rules, including tags like 'a', 'abbr', and 'area' with their translation and structure settings.

    Trados Studio project settings for XML file type detection, displaying a rule with the root element name 'TYPO3L10N'.

    Trados Studio project settings for embedded content processing, showing the option to process embedded content with 'Html Embedded Content 5 2.0.0.0'.

    Trados Studio project settings showing XML parser rules, including elements like 'data', 't3_wordCount', and 't3_targetLang' with their translation and structure settings.

    My other embedded content processing settings are standard. When I switch to Embedded content plain text processing Inside CDATA elements then it includes all parts of XML file but with HTML codes as text and not tags -. what I do not need of course.

    And here I am showing parts of my XML file: the beginning if the file and the problematic parts, with customer specific texts overwritten.

    <!DOCTYPE TYPO3L10N [ <!ENTITY nbsp " "> ]>
    <TYPO3L10N>
    <head>
    <t3_l10ncfg translate="no">150</t3_l10ncfg>
    <t3_sysLang translate="no">10</t3_sysLang>
    <t3_sourceLang translate="no">en-GB</t3_sourceLang>
    <t3_targetLang translate="no">hu-HU</t3_targetLang>
    <t3_baseURL translate="no">/</t3_baseURL>
    <t3_workspaceId translate="no">0</t3_workspaceId>
    <t3_count translate="no">57</t3_count>
    <t3_wordCount translate="no">1134</t3_wordCount>
    <t3_formatVersion translate="no">2.0</t3_formatVersion>
    <t3_l10nmgrVersion translate="no">12.0.0</t3_l10nmgrVersion>
    </head>

    .....

    <data table="tx_company_hotspot_items" elementUid="1357" key="tx_company_hotspot_items:NEW/10/1357:text"><![CDATA[<p>Why is that and how to make Notepad++ converting the value into the same character visual as it shows it when I simply copy/paste it already rendered form a webpage? Am I doing something wrong? Can anyone show me the proper way how would I simply write the actual Unicode value into the notepad++ and then it would converts it to the correct character?</p>
    <p>Discover our wide range of fine solutions:</p>
    <ul><li><a href="t3://page?uid=1364">Rolling technology</a></li><li><a href="t3://page?uid=1374">High-speed drilling</a></li><li><a href="t3://page?uid=1375">Resistance proofing</a></li><li><a href="t3://page?uid=15">ASA and standard parts</a></li><li><a href="t3://page?uid=1360">Direct turning</a></li><li><a href="t3://page?uid=1369">Tolerance measurement</a></li><li><a href="t3://page?uid=1365">Screw sharpening</a></li><li><a href="t3://page?uid=5758">Thin technology</a></li></ul>]]></data>

    It is interesting that all other parts of this XML file are processed well by Trados Studio, only theses parts with <a tags and href attributes are missing.

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 8:51 AM (GMT 1) on 1 May 2025]
Reply
  • Hi Paul,

    Here I am showing my project settings: the HTML 5 embedded content processor settings and the specific XML file settings.

    Trados Studio project settings showing HTML 5 embedded content processor rules, including tags like 'a', 'abbr', and 'area' with their translation and structure settings.

    Trados Studio project settings for XML file type detection, displaying a rule with the root element name 'TYPO3L10N'.

    Trados Studio project settings for embedded content processing, showing the option to process embedded content with 'Html Embedded Content 5 2.0.0.0'.

    Trados Studio project settings showing XML parser rules, including elements like 'data', 't3_wordCount', and 't3_targetLang' with their translation and structure settings.

    My other embedded content processing settings are standard. When I switch to Embedded content plain text processing Inside CDATA elements then it includes all parts of XML file but with HTML codes as text and not tags -. what I do not need of course.

    And here I am showing parts of my XML file: the beginning if the file and the problematic parts, with customer specific texts overwritten.

    <!DOCTYPE TYPO3L10N [ <!ENTITY nbsp " "> ]>
    <TYPO3L10N>
    <head>
    <t3_l10ncfg translate="no">150</t3_l10ncfg>
    <t3_sysLang translate="no">10</t3_sysLang>
    <t3_sourceLang translate="no">en-GB</t3_sourceLang>
    <t3_targetLang translate="no">hu-HU</t3_targetLang>
    <t3_baseURL translate="no">/</t3_baseURL>
    <t3_workspaceId translate="no">0</t3_workspaceId>
    <t3_count translate="no">57</t3_count>
    <t3_wordCount translate="no">1134</t3_wordCount>
    <t3_formatVersion translate="no">2.0</t3_formatVersion>
    <t3_l10nmgrVersion translate="no">12.0.0</t3_l10nmgrVersion>
    </head>

    .....

    <data table="tx_company_hotspot_items" elementUid="1357" key="tx_company_hotspot_items:NEW/10/1357:text"><![CDATA[<p>Why is that and how to make Notepad++ converting the value into the same character visual as it shows it when I simply copy/paste it already rendered form a webpage? Am I doing something wrong? Can anyone show me the proper way how would I simply write the actual Unicode value into the notepad++ and then it would converts it to the correct character?</p>
    <p>Discover our wide range of fine solutions:</p>
    <ul><li><a href="t3://page?uid=1364">Rolling technology</a></li><li><a href="t3://page?uid=1374">High-speed drilling</a></li><li><a href="t3://page?uid=1375">Resistance proofing</a></li><li><a href="t3://page?uid=15">ASA and standard parts</a></li><li><a href="t3://page?uid=1360">Direct turning</a></li><li><a href="t3://page?uid=1369">Tolerance measurement</a></li><li><a href="t3://page?uid=1365">Screw sharpening</a></li><li><a href="t3://page?uid=5758">Thin technology</a></li></ul>]]></data>

    It is interesting that all other parts of this XML file are processed well by Trados Studio, only theses parts with <a tags and href attributes are missing.

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 8:51 AM (GMT 1) on 1 May 2025]
Children