Excluding any text that isn't English from the source section in a bilingual XML

I have a bilingual XML file, is there a way of automatically excluding any text that isn't English from the source section when processing the file in Trados? It's currently a mix of Arabic and English in the source, and I only want to translate the English into Arabic. I could go through the file manually and lock all of the Arabic sections, but this is a really large file so would take too long, is there a way of doing this automatically/creating a setting that excludes the Arabic in the first place? Thank you!

I have SDL Trados Studio 2021 - 16.0.2.3343

emoji
Parents
  •  

    I have a bilingual XML file, is there a way of automatically excluding any text that isn't English from the source section when processing the file in Trados?

    Probably.

    Can you provide a sample of the XML as this is the only way you'll be able to get concrete answer to this question.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Thank you Paul! Any chance I could send you an email with the file? Not sure I can share it publicly unfortunately as we have a few NDAs in place with the client. Thank you! 

    emoji
  •  

    Sure... pfilkin at rws dotcom

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    Thanks for sending me the file.  I have a couple of questions.

    • the file contains language code attributes like this for example:
      countries="BR,CN,DE,RO,PL,FR,ES,PT,NL,GB,HU,IT,RU,TR,ZA,JP,KR,ME,MX,IL"
    • also this attribute:
      translate="no"

    I cannot see any language code for Arabic, nor can I see any Arabic text in the file... only English and Chinese.  Please can you confirm which country code should be used for Arabic (I expected AR but cannot find a single reference to this) or at least how these country code attributes should be used?

    Can you also confirm if the translate="no" attribute should be acted upon as this will also have an impact on the xml filetype I create once you confirm how I'm supposed to decide which segments go into Arabic.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Reply
  •  

    Thanks for sending me the file.  I have a couple of questions.

    • the file contains language code attributes like this for example:
      countries="BR,CN,DE,RO,PL,FR,ES,PT,NL,GB,HU,IT,RU,TR,ZA,JP,KR,ME,MX,IL"
    • also this attribute:
      translate="no"

    I cannot see any language code for Arabic, nor can I see any Arabic text in the file... only English and Chinese.  Please can you confirm which country code should be used for Arabic (I expected AR but cannot find a single reference to this) or at least how these country code attributes should be used?

    Can you also confirm if the translate="no" attribute should be acted upon as this will also have an impact on the xml filetype I create once you confirm how I'm supposed to decide which segments go into Arabic.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Children
  • Thanks Paul! Let me take a look and get back to you! 

    emoji
  • Hi Paul, I'm so sorry for this delay! I was checking with the client exactly what was required, I think I've got my head around it now!

    So, the client uses country codes rather than ISO language codes, so "ME" in the file refers to Arabic, which is why "AR" doesn't appear. They also said that the "translate=no" can be ignored as it was a tag that was set up but wasn't finished so it's not a function that they use.

    The set up for the rest of the file is as follows:

    - If the country code "ME" is in the list of country codes, the text needs to be translated into Arabic (text specific to the ME market)

    - If there are no country codes mentioned before the text, then it needs to be translated into Arabic (text that is global, and applies to all markets)

    - If there is a list of country codes and "ME" does not appear, it does not need to be translated into Arabic (text that isn't for the ME market)

    Is there a way we can set up a settings file to cover the above?

    To explain, we translate these files into many languages. So could we amend this settings file so that we can use it on French/German etc files (if we change "ME" to "FR" and "DE" in the settings file)? Or is it not that simple?

    Thank you for your continued help Paul! I hope that above helps explain the file a little better! 

    emoji
  •  

    My turn to apologise... I cannot find your email or the file you sent me.  Possibly because the name in here is different to the one you used to send it to me?  Can you send it again please?

    Thanks.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • It is yes! I'll resend it now!

    emoji
  •  

    Thanks for the email... one more question.  When you have something like this:

    Screenshot of Trados Studio showing a poorly formatted XML file with a list entry tag containing multiple country codes and a paragraph tag for translation text.

    Where do you want the translation to go? Do you want to overwrite the English?  Do you want to create new elements, one for each language so you end up with a file containing all the languages mentioned under countries?

    This is really a brilliant example of how NOT to prepare files for localisation, but nonetheless I would be asking this question to know what the client needs to receive back.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 9:19 AM (GMT 0) on 29 Feb 2024]
  • Thanks Paul, so the client generates one file per language, so we'd be looking to overwrite the "some text for translation" with the translated text in just one language. These aren't multilingual files, so we'd get one source file per language and overwrite the English with the translation. I hope that made sense?! Sorry I know these are pretty confusing! 

    emoji
  •   

    ok -this actually makes it easier as your original question referred to a bilingual XML, but this isn't a bilingual or multilingual XML.  It's just a monolingual XML where we just need to extract the text for translation based on the rules you provided and overwrite it with the target translation.

    So... I now have another question... hopefully the last one.  In this example where I have highlighted all the possible text to be extracted for translation:

    Screenshot of Trados Studio XML code with highlighted text showing parent elements containing country attributes and child elements for extraction.

    You can see that only the parent elements contain the country.  In the first para it's in this:

    <Description medium="all" countries="TR,ME,IL,CN">

    In the next it's in this:

    <P medium="all" countries="TR,ME,IL,CN">

    Should I be extracting nothing at all since none of these child elements contain that attribute themselves, or should I simply be extracting them all because they don't have the country attribute as per your second rule above?

    Or should I be extracting only the child elements like this, so the last one doesn't get extracted?

    Close-up of Trados Studio XML code with highlighted sections indicating text for translation extraction, including country attributes and emphasis tags.

    With this information I probably have enough to give you an idea and hopefully you'll be able to manage this on your own thereafter.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 9:19 AM (GMT 0) on 29 Feb 2024]
  •  

    You can see why this needs clearing up...

    Screenshot of Trados Studio showing XML code with a 'Description' element containing a 'Note' parent element with country rule applied, and child 'P' elements with individual country attributes overriding the parent rule.

    The translatable text is all in child elements of Note, and there is a country rule applied.  But then the individual child elements of Note seem to override it.  So this suggests  I should pay attention to child elements and apply the attribute from the parent... unless they are overridden.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 9:19 AM (GMT 0) on 29 Feb 2024]
  • Thanks for this Paul, let me have a look and get back to you!

    emoji
  •  

    In the meantime I did try to solve this and couldn't!  But I did get a bit of help as it was an interesting one for my own learning journey so I created a small example and asked in stackoverflow:

    https://stackoverflow.com/questions/74386847/only-xpath-for-extracting-text-for-multiple-conditions-in-xml-no-code-possible

    Got the answer just now, which I would not have been able to do myself:

    //*[text()[normalize-space()]][not(ancestor-or-self::*/@countries) or contains(ancestor-or-self::*[@countries][1]/@countries, 'ME')]

    Like this in Studio:

    Trados Studio parser options window showing an XML XPath rule input for extracting text with multiple conditions.

    This seems to do the trick.  This is possibly the most complicated XML XPath I've tried to solve, mainly required because the source file itself isn't well thought out for localization purposes.  The simplest way, and potentially error free, would have been to add the supported countries for every element containing translatable text and then you could always and easily pull out the languages you need.

    However, this is pretty clever and you can easily adapt by changing the language code for the languages you need.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 9:20 AM (GMT 0) on 29 Feb 2024]