Multilingual XML addon not segmenting after full stops.

Hello, 

I am new to Trados Studio and am not familiar with all of the intricacies of the software yet, so chances are I am missing something obvious, but I am trying to solve an issue where an XML file with embedded content and CDATA sections does not apply segmentation after each full stop. 

I am having to use the Multilingual XML addon due to how the translation targets are set up in the XML file. So the file type is determined by that. 

Is there a way I can add segmentation rules to this filetype or in general, which would allow for the sentences to be segmented after each complete sentence, or full stop? 

Here is what the file looks like in Trados and a section of the XML.

 

<MODULEMAPPING><![CDATA[]]></MODULEMAPPING>
<PAGEMAPPING><![CDATA[]]></PAGEMAPPING>
<TEXT><![CDATA[<span class="dki-text-style-bold" id="yui-gen1289">What is sustainability?</span>]]></TEXT>
<TRANSLATION><![CDATA[]]></TRANSLATION>
<TYPE><![CDATA[elementMeta]]></TYPE>
</CONTENT>
<CONTENT CF_TYPE='query'>
<COURSEMAPPING><![CDATA[]]></COURSEMAPPING>
<ELEMENTMAPPING><![CDATA[Content]]></ELEMENTMAPPING>
<ID><![CDATA[72310]]></ID>
<LASTMODIFIED><![CDATA[]]></LASTMODIFIED>
<LEARNINGOBJECTMAPPING><![CDATA[]]></LEARNINGOBJECTMAPPING>
<MODULEMAPPING><![CDATA[]]></MODULEMAPPING>
<PAGEMAPPING><![CDATA[]]></PAGEMAPPING>
<TEXT><![CDATA[Sustainability focuses on meeting the needs of the present without compromising the ability of future generations to meet their own needs (Brundtland Report, 1987).<br /><br />To be sustainable, businesses must consider environmental, social and economic impacts in the long-term when making business decisions, rather than only focusing on short-term gains. This involves thinking more holistically beyond immediate considerations of profit and loss.<br /><br />Sustainability requires us to manage environmental, social and economic risks and opportunities for long-term value creation, securing business longevity and building ongoing business success. It is at the core of our business operations, our purpose and our strategy for growth.<br /><br />A focus on sustainability enhances the reputation of our brand with our customers, investors and in our local communities and ensures our business continues to be resilient in the face of global megatrends, such as climate change and resource scarcity.]]></TEXT>
<TRANSLATION><![CDATA[]]></TRANSLATION>
<TYPE><![CDATA[elementMeta]]></TYPE>
</CONTENT>
<CONTENT CF_TYPE='query'>
<COURSEMAPPING><![CDATA[]]></COURSEMAPPING>
<ELEMENTMAPPING><![CDATA[Content]]></ELEMENTMAPPING>
<ID><![CDATA[72302]]></ID>
<LASTMODIFIED><![CDATA[]]></LASTMODIFIED>
<LEARNINGOBJECTMAPPING><![CDATA[]]></LEARNINGOBJECTMAPPING>

If more information is needed to help with this, please let me know which and I'll try to provide that. 

Thank you.



Moved the sample code into a code box.
[edited by: Paul at 1:25 PM (GMT 0) on 24 Jan 2022]
emoji
  • I am still unable to determine how to resolve this issue, I've checked the translation memory and the default full stop rule is in place. I've even tried adding an additional one to segment for anything before and after a full stop but with no luck. Any help on this would be appreciated. 

    emoji
  • I'm afraid this is not possible at the moment. Normal segmentation rules applied inside the CDATA section, using the APIs, is a problem we are still trying to solve for this app.  We'll update you as soon as we have something sensible updated.

    Perhaps it's worth using the out of the box XML filetype for your file.  I don't see anything in the sample you provided that would make me not use it.

    I am having to use the Multilingual XML addon due to how the translation targets are set up in the XML file.

    Can you elaborate on this?

    fyi.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • The Multilingual XML file type was recommended to us by the support team as a way to be able to translate the XML files we are working with, which do not specify a language but instead work with fields named TEXT and TRANSLATION. When I attempted to get a translation to work with the default XML file type, I could not get the translation fields to populate properly, Trados would instead translate the text field, leaving the translation field empty. When attempting to re-import the xml into the system it would understandably state that the fields are empty. 

    This is what the language mapping looks like with the addin: 


    And here is a snippet from the XML. 

    <?xml version="1.0" encoding="utf-8"?>
    <XML_ELEMENT CF_TYPE='struct'>
    <VARIABLES CF_TYPE='array'>
    </VARIABLES>
    <COURSEID><![CDATA[100183]]></COURSEID>
    <LINKS CF_TYPE='array'>
    <LINK CF_TYPE='struct'>
    <DESCRIPTION CF_TYPE='struct'>
    <TEXT><![CDATA[]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </DESCRIPTION>
    <TITLE CF_TYPE='struct'>
    <TEXT><![CDATA[Group, Steam Specialties and Watson-Marlow colleagues]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </TITLE>
    <ID><![CDATA[11946]]></ID>
    <URL><![CDATA[https://spiraxsarco.sharepoint.com/sites/SSE_Sustainability]]></URL>
    </LINK>
    <LINK CF_TYPE='struct'>
    <DESCRIPTION CF_TYPE='struct'>
    <TEXT><![CDATA[]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </DESCRIPTION>
    <TITLE CF_TYPE='struct'>
    <TEXT><![CDATA[Chromalox colleagues]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </TITLE>
    <ID><![CDATA[11947]]></ID>
    <URL><![CDATA[https://chromalox.sharepoint.com/sites/SSE_Sustainability]]></URL>
    </LINK>
    <LINK CF_TYPE='struct'>
    <DESCRIPTION CF_TYPE='struct'>
    <TEXT><![CDATA[]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </DESCRIPTION>
    <TITLE CF_TYPE='struct'>
    <TEXT><![CDATA[Thermocoax colleagues]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </TITLE>
    <ID><![CDATA[11948]]></ID>
    <URL><![CDATA[https://31520thermocoaxsas.sharepoint.com/sites/SSE_Sustainability]]></URL>
    </LINK>
    <LINK CF_TYPE='struct'>
    <DESCRIPTION CF_TYPE='struct'>
    <TEXT><![CDATA[]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </DESCRIPTION>
    <TITLE CF_TYPE='struct'>
    <TEXT><![CDATA[External]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </TITLE>
    <ID><![CDATA[11949]]></ID>
    <URL><![CDATA[https://www.spiraxsarcoengineering.com/sustainability/one-planet]]></URL>
    </LINK>
    <LINK CF_TYPE='struct'>
    <DESCRIPTION CF_TYPE='struct'>
    <TEXT><![CDATA[]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </DESCRIPTION>
    <TITLE CF_TYPE='struct'>
    <TEXT><![CDATA[Goals]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </TITLE>
    <ID><![CDATA[11913]]></ID>
    <URL><![CDATA[https://sdgs.un.org/goals]]></URL>
    </LINK>
    </LINKS>
    <CONTENTS>
    <CONTENT CF_TYPE='query'>
    <COURSEMAPPING><![CDATA[Title]]></COURSEMAPPING>
    <ELEMENTMAPPING><![CDATA[]]></ELEMENTMAPPING>
    <ID><![CDATA[xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx]]></ID>
    <LASTMODIFIED><![CDATA[2022-01-12 06:56:04.927]]></LASTMODIFIED>
    <LEARNINGOBJECTMAPPING><![CDATA[]]></LEARNINGOBJECTMAPPING>
    <MODULEMAPPING><![CDATA[]]></MODULEMAPPING>
    <PAGEMAPPING><![CDATA[]]></PAGEMAPPING>
    <TEXT><![CDATA[Introducing Sustainability (de-DE)]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    <TYPE><![CDATA[metadataTitle]]></TYPE>
    </CONTENT>
    <CONTENT CF_TYPE='query'>
    <COURSEMAPPING><![CDATA[Description]]></COURSEMAPPING>
    <ELEMENTMAPPING><![CDATA[]]></ELEMENTMAPPING>
    <ID><![CDATA[xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx]]></ID>
    <LASTMODIFIED><![CDATA[2022-01-12 06:56:04.927]]></LASTMODIFIED>
    <LEARNINGOBJECTMAPPING><![CDATA[]]></LEARNINGOBJECTMAPPING>
    <MODULEMAPPING><![CDATA[]]></MODULEMAPPING>
    <PAGEMAPPING><![CDATA[]]></PAGEMAPPING>
    <TEXT><![CDATA[This course will introduce colleagues to sustainability and some related concepts, explain why sustainability is important to our business, and provide an overview of our Group One Planet strategy so that everyone can understand their role in implementing the strategy, communicate it to their teams and other stakeholders, and make a contribution to delivering our One Planet goals and objectives. You must correctly answer 80% of the questions in order to successfully complete the course.]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    <TYPE><![CDATA[metadataDescription]]></TYPE>
    </CONTENT>
    <CONTENT CF_TYPE='query'>
    <COURSEMAPPING><![CDATA[]]></COURSEMAPPING>
    <ELEMENTMAPPING><![CDATA[]]></ELEMENTMAPPING>
    <ID><![CDATA[200195]]></ID>
    <LASTMODIFIED><![CDATA[]]></LASTMODIFIED>
    <LEARNINGOBJECTMAPPING><![CDATA[]]></LEARNINGOBJECTMAPPING>

    emoji
  • ok - so for now a suitable workaround might be to simply copy the translatable text from inbetween the TEXT elements and copy it into the TRANSLATION element.  For example, in a decent text editor that supports regular expressions I can search for this:

    (<TEXT>)(<!\[CDATA\[[^]]*\]\]>)(</TEXT>\r\n)(<TRANSLATION>)<[^>]+>(</TRANSLATION>)

    And replace with this:

    $1$2$3$4$2$5

    That will create a file that looks like this:

    <?xml version="1.0" encoding="utf-8"?>
    <XML_ELEMENT CF_TYPE='struct'>
    <VARIABLES CF_TYPE='array'>
    </VARIABLES>
    <COURSEID><![CDATA[100183]]></COURSEID>
    <LINKS CF_TYPE='array'>
    <LINK CF_TYPE='struct'>
    <DESCRIPTION CF_TYPE='struct'>
    <TEXT><![CDATA[]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </DESCRIPTION>
    <TITLE CF_TYPE='struct'>
    <TEXT><![CDATA[Group, Steam Specialties and Watson-Marlow colleagues]]></TEXT>
    <TRANSLATION><![CDATA[Group, Steam Specialties and Watson-Marlow colleagues]]></TRANSLATION>
    </TITLE>
    <ID><![CDATA[11946]]></ID>
    <URL><![CDATA[https://spiraxsarco.sharepoint.com/sites/SSE_Sustainability]]></URL>
    </LINK>
    <LINK CF_TYPE='struct'>
    <DESCRIPTION CF_TYPE='struct'>
    <TEXT><![CDATA[]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </DESCRIPTION>
    <TITLE CF_TYPE='struct'>
    <TEXT><![CDATA[Chromalox colleagues]]></TEXT>
    <TRANSLATION><![CDATA[Chromalox colleagues]]></TRANSLATION>
    </TITLE>
    <ID><![CDATA[11947]]></ID>
    <URL><![CDATA[https://chromalox.sharepoint.com/sites/SSE_Sustainability]]></URL>
    </LINK>
    <LINK CF_TYPE='struct'>
    <DESCRIPTION CF_TYPE='struct'>
    <TEXT><![CDATA[]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </DESCRIPTION>
    <TITLE CF_TYPE='struct'>
    <TEXT><![CDATA[Thermocoax colleagues]]></TEXT>
    <TRANSLATION><![CDATA[Thermocoax colleagues]]></TRANSLATION>
    </TITLE>
    <ID><![CDATA[11948]]></ID>
    <URL><![CDATA[https://31520thermocoaxsas.sharepoint.com/sites/SSE_Sustainability]]></URL>
    </LINK>
    <LINK CF_TYPE='struct'>
    <DESCRIPTION CF_TYPE='struct'>
    <TEXT><![CDATA[]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </DESCRIPTION>
    <TITLE CF_TYPE='struct'>
    <TEXT><![CDATA[External]]></TEXT>
    <TRANSLATION><![CDATA[External]]></TRANSLATION>
    </TITLE>
    <ID><![CDATA[11949]]></ID>
    <URL><![CDATA[https://www.spiraxsarcoengineering.com/sustainability/one-planet]]></URL>
    </LINK>
    <LINK CF_TYPE='struct'>
    <DESCRIPTION CF_TYPE='struct'>
    <TEXT><![CDATA[]]></TEXT>
    <TRANSLATION><![CDATA[]]></TRANSLATION>
    </DESCRIPTION>
    <TITLE CF_TYPE='struct'>
    <TEXT><![CDATA[Goals]]></TEXT>
    <TRANSLATION><![CDATA[Goals]]></TRANSLATION>
    </TITLE>
    <ID><![CDATA[11913]]></ID>
    <URL><![CDATA[https://sdgs.un.org/goals]]></URL>
    </LINK>
    </LINKS>
    <CONTENTS>
    <CONTENT CF_TYPE='query'>
    <COURSEMAPPING><![CDATA[Title]]></COURSEMAPPING>
    <ELEMENTMAPPING><![CDATA[]]></ELEMENTMAPPING>
    <ID><![CDATA[xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx]]></ID>
    <LASTMODIFIED><![CDATA[2022-01-12 06:56:04.927]]></LASTMODIFIED>
    <LEARNINGOBJECTMAPPING><![CDATA[]]></LEARNINGOBJECTMAPPING>
    <MODULEMAPPING><![CDATA[]]></MODULEMAPPING>
    <PAGEMAPPING><![CDATA[]]></PAGEMAPPING>
    <TEXT><![CDATA[Introducing Sustainability (de-DE)]]></TEXT>
    <TRANSLATION><![CDATA[Introducing Sustainability (de-DE)]]></TRANSLATION>
    <TYPE><![CDATA[metadataTitle]]></TYPE>
    </CONTENT>
    <CONTENT CF_TYPE='query'>
    <COURSEMAPPING><![CDATA[Description]]></COURSEMAPPING>
    <ELEMENTMAPPING><![CDATA[]]></ELEMENTMAPPING>
    <ID><![CDATA[xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx]]></ID>
    <LASTMODIFIED><![CDATA[2022-01-12 06:56:04.927]]></LASTMODIFIED>
    <LEARNINGOBJECTMAPPING><![CDATA[]]></LEARNINGOBJECTMAPPING>
    <MODULEMAPPING><![CDATA[]]></MODULEMAPPING>
    <PAGEMAPPING><![CDATA[]]></PAGEMAPPING>
    <TEXT><![CDATA[This course will introduce colleagues to sustainability and some related concepts, explain why sustainability is important to our business, and provide an overview of our Group One Planet strategy so that everyone can understand their role in implementing the strategy, communicate it to their teams and other stakeholders, and make a contribution to delivering our One Planet goals and objectives. You must correctly answer 80% of the questions in order to successfully complete the course.]]></TEXT>
    <TRANSLATION><![CDATA[This course will introduce colleagues to sustainability and some related concepts, explain why sustainability is important to our business, and provide an overview of our Group One Planet strategy so that everyone can understand their role in implementing the strategy, communicate it to their teams and other stakeholders, and make a contribution to delivering our One Planet goals and objectives. You must correctly answer 80% of the questions in order to successfully complete the course.]]></TRANSLATION>
    <TYPE><![CDATA[metadataDescription]]></TYPE>
    </CONTENT>
    <CONTENT CF_TYPE='query'>
    <COURSEMAPPING><![CDATA[]]></COURSEMAPPING>
    <ELEMENTMAPPING><![CDATA[]]></ELEMENTMAPPING>
    <ID><![CDATA[200195]]></ID>
    <LASTMODIFIED><![CDATA[]]></LASTMODIFIED>
    <LEARNINGOBJECTMAPPING><![CDATA[]]></LEARNINGOBJECTMAPPING>

    Now all I have to do is create a custom XML filetype with the out of the box XML and translate the TRANSLATION element as it now contains all the text that needs to be translated.  Not the perfect solution as the multilingual XML would have been ideal for this, especially if you had any content in the TRANSLATION element to start with, but this should allow you to properly segment the files and once you have this working it doesn't take long.

    I hope this helps to get you started with a solution until we improve the multilingual XML.

    One thing to note is I used \r\n for the carriage return.  I did this because when I copy/pasted the example you provided into my editor they showed up CR and LF.  In reality the actual file may not be like that so it the expression doesn't work for you that would be the place to start looking for a resolution!

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi Paul, 

    Thank you, that's worked to the point where I now I have the file looking like this in Trados: 


    Again, I am brand new to Trados, so at this point I am not sure how I can make it ignore what's in the TEXT fields and only show the TRANSLATION fields as translatable, so that the text doesn't show up duplicated as it is now. 

    The regex settings ended up being : 

    Find: (<TEXT>)(<!\[CDATA\[[^]]*\]\]>)(<\/TEXT>\s+)(<TRANSLATION>)<[^>]+>(<\/TRANSLATION>)

     

    Replace: $1$2$3$4$2$5

     

    The settings for the F&R will be needed to be set up as below:

    emoji
  • Please ignore that last question, I managed to figure it out. 
    All in all I've now got a working file. Thank you so much for the help!

    emoji
  • All in all I've now got a working file. Thank you so much for the help!

    Excellent!  Thanks for letting us know.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji