Custom xml filetype: filter by attribue value (specific IDs) of parent element

Hi ,

I got a specifc urgent question. I got a survey xml from my client. I created a custome xml filetype for it to grab all translatable strings, but I also got a file with questions (IDs) to exlude. Is it possible to filter these via filetype?

The inline elements to be translated are <p> and <computed> and they are nested within several parent elements of which the main one is <question number="x" type="xy">.

So my question is, it is possible to set up a filter to say only show <p> and <computed> for translation if <question …> attribute number is not any of listed ones (e.g. (regex) (258|259|260|1130|2255))?

Or maybe add a new element <exclude> around them, so they won't be listed for translation (I would then have to remove these before returning the translation)?

Best regards,

Pascal



better formulation of question while using correct terminology
[edited by: Pascal Zotto at 1:49 PM (GMT 1) on 15 May 2023]
emoji
Parents
  •  

    Always helps to have a sample to play with.  But perhaps this will help.  I created this sample and the expressions with the help of ChatGPT:

    <quiz>
        <question number="258" type="MCQ">
            <content>
                <p>What is the name of the fire-breathing creature in Greek mythology?</p>
                <computed>Hint: It has the body of a lion and the tail of a serpent.</computed>
            </content>
        </question>
        <question number="359" type="TF">
            <content>
                <p>Is the Phoenix a creature that is reborn from its own ashes?</p>
                <computed>True or False?</computed>
            </content>
        </question>
        <question number="460" type="MCQ">
            <content>
                <p>Which creature in Norse mythology is known as the world serpent?</p>
                <computed>Hint: Its name starts with a 'J'.</computed>
            </content>
        </question>
        <question number="561" type="MCQ">
            <content>
                <p>What is the name of the one-eyed giants in Greek mythology?</p>
                <computed>Hint: It starts with 'C'.</computed>
            </content>
        </question>
        <question number="662" type="TF">
            <content>
                <p>Are Unicorns considered mythical creatures in every culture?</p>
                <computed>True or False?</computed>
            </content>
        </question>
        <question number="763" type="MCQ">
            <content>
                <p>What is the name of the multi-headed dog guarding the underworld in Greek mythology?</p>
                <computed>Hint: It starts with 'C'.</computed>
            </content>
        </question>
        <question number="864" type="MCQ">
            <content>
                <p>Which creature in Chinese mythology is known for its power over water?</p>
                <computed>Hint: It's a dragon.</computed>
            </content>
        </question>
        <question number="965" type="TF">
            <content>
                <p>Is the Kraken a legendary sea monster of gigantic size in Scandinavian folklore?</p>
                <computed>True or False?</computed>
            </content>
        </question>
        <question number="1066" type="MCQ">
            <content>
                <p>What is the name of the bird in Egyptian mythology that symbolizes the sun, creation, and rebirth?</p>
                <computed>Hint: It starts with 'B'.</computed>
            </content>
        </question>
        <question number="1167" type="TF">
            <content>
                <p>Is Bigfoot considered a mythical creature?</p>
                <computed>True or False?</computed>
            </content>
        </question>
        <question number="1268" type="MCQ">
            <content>
                <p>Which creature in Japanese mythology is a turtle-like creature often depicted with a tail and long neck?</p>
                <computed>Hint: It starts with 'K'.</computed>
            </content>
        </question>
        <question number="1369" type="MCQ">
            <content>
                <p>What is the name of the half-man, half-horse creatures in Greek mythology?</p>
                <computed>Hint: It starts with 'C'.</computed>
            </content>
        </question>
    <question number="1470" type="TF">
        <content>
            <p>Is Medusa a mythical creature with snakes for hair in Greek mythology?</p>
            <computed>True or False?</computed>
        </content>
    </question>
    <question number="1571" type="MCQ">
        <content>
            <p>What is the name of the legendary creature in Irish folklore known for its shape-shifting abilities?</p>
            <computed>Hint: It starts with 'C'.</computed>
        </content>
    </question>
    <question number="1672" type="MCQ">
        <content>
            <p>Which creature in Hindu mythology is depicted as a large serpent that surrounds the world?</p>
            <computed>Hint: It starts with 'S'.</computed>
        </content>
    </question>
    <question number="1773" type="TF">
        <content>
            <p>Are Goblins considered mythical creatures that are mischievous and troublemakers?</p>
            <computed>True or False?</computed>
        </content>
    </question>
    <question number="1874" type="MCQ">
        <content>
            <p>What is the name of the half-man, half-goat creatures in Greek mythology?</p>
            <computed>Hint: It starts with 'S'.</computed>
        </content>
    </question>
    <question number="1975" type="TF">
        <content>
            <p>Is the Loch Ness Monster a mythical creature believed to inhabit the waters of Loch Ness in Scotland?</p>
            <computed>True or False?</computed>
        </content>
    </question>
    <question number="2076" type="MCQ">
        <content>
            <p>Which mythical creature is known for luring sailors to their doom with their enchanting voices in Greek mythology?</p>
            <computed>Hint: It starts with 'S'.</computed>
        </content>
    </question>
    <question number="2177" type="TF">
        <content>
            <p>Is the Griffin a mythical creature with the body of a lion and the head of an eagle?</p>
            <computed>True or False?</computed>
        </content>
    </question>
    <question number="2278" type="MCQ">
        <content>
            <p>What is the name of the mythical creature in Slavic folklore known for its ability to control the weather and bring storms?</p>
            <computed>Hint: It starts with 'B'.</computed>
        </content>
    </question>
    <question number="2255" type="TF">
        <content>
            <p>Is the Manticore a mythical creature with the body of a lion, a human head, and a scorpion tail?</p>
            <computed>True or False?</computed>
        </content>
    </question>
    </quiz>

    The XPath expression to select the <p> and <computed> elements for translation, except for the ones whose parent <question> has an attribute number equal to 561, 763, 1066, or 1268 would be this for example:

    //question[not(@number=561 or @number=763 or @number=1066 or @number=1268)]/content/*[self::p or self::computed]

    1. //question: It selects all <question> elements in the XML document.
    2. [not(@number=561 or @number=763 or @number=1066 or @number=1268)]: Filters the <question> elements that don't have a number attribute equal to 561, 763, 1066, or 1268.
    3. /content: Selects the <content> child element of the filtered <question> elements.
    4. /*[self::p or self::computed]: Selects the child elements of the <content> elements that are either <p> or <computed>.

    I don't think regex will work for your use case.  But to try and make it easier to add lists of exclusions I pressed ChatGPT for a better answer:

    //question[not(contains('|561|763|1066|1268|', concat('|', @number, '|')))]/content/*[self::p or self::computed]

    1. //question: It selects all <question> elements in the XML document.
    2. not(contains('|561|763|1066|1268|', concat('|', @number, '|'))):
      • concat('|', @number, '|'): Concatenates the current number attribute value with pipe characters | on both sides.
      • contains('|561|763|1066|1268|', ...): Checks if the concatenated string is present in the list of exclusions (the pipe-separated string).
      • not(...): Filters the <question> elements that don't match the exclusions.
    3. /content: Selects the <content> child element of the filtered <question> elements.
    4. /*[self::p or self::computed]: Selects the child elements of the <content> elements that are either <p> or <computed>.

    To add more exclusions, simply append them to the pipe-separated list, like |561|763|1066|1268|NewExclusion|.

    That may not be exactly what you were after as you didn't provide a sample file, but perhaps it'll help you to handle the file you have?  But both expressions worked well for me in Trados Studio with just two rules... for example;

    Trados Studio parser rule configuration window with an XPath expression input to filter specific question elements for translation.

    Preview of an XML file in Trados Studio showing questions about mythical creatures with corresponding hints.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub



    Generated Image Alt-Text
    [edited by: Trados AI at 11:07 AM (GMT 0) on 29 Feb 2024]
  • Hi ,

    yes, sorry, I was completely in a hassle yesterday and forgot to add a sample and my filetype settings I have so far. I thought of these 2 hours ago.

    This is my first try to build such a filetype setting from scratch.

    Apparently we used different approaches to create the filetype settings as I don't get any results at all, when I add the rules to the parser. :( I have Trados 2022. I tried 2 filetype settings but don't get the correct result...

    Not sure what I did wrong when using the wizard to create the settings.2 xml file type settings.zip

    emoji
  • Thanks ,

    I'll have to check that one after solving one more issue I just saw occurs: apparently Trados ignores my XML survey 2 filetype setting and uses the inbuilt (last in the list of filetype settings) XML 2: Any XML. If XML survey FT setting is enabled then Trados uses that one instead of XML 2: Any XML FT setting.

    How can that happen? My own FT setting is enabled and looks for XML, that might then also explain why my embedded content rules did not match and why the parser rules don't work as the FT settings are not considered at all.

    emoji
  • Ok, this part is solved. There was a trailing space in the Detection string. The tags now also work. Now I only need to check the parser rules. I’ll keep you updated.

    emoji
  • Ok, the parser does still not work correctly. If I try to change it (paste your parser rule) and click on OK I get the error: the object reference was not set to an object instance.

    Furthermore I have html tags in the strings to translate: for some unknown reason segments get segmented before and after these tags. How can I avoid this segmentation? (I chose your first way for the embedded contents.)

    emoji
  • Hi ,

    OK, I got rid of the error. I had to remove the paragraph setting.

    The tags still work but the issue with segmentation is still not solved.

    Isn't there a way to remove or disable the general embedded content HMTL parser from XML files? Maybe via a different file type setting?

    And filtering does not work yet either...

    I'll try, what results I get with way #3.

    Edit: How do I get Multilingual XML to recognise my xml? 'Language Root: /survey' does not work although it should as this is the root element.

    emoji
  • Ok, now I managed to get the filetype associated but it only returns the content of the very first p element

    emoji
  • I managed to extract the first text element in total 3 p elements which generate 5 segments but Trados just duplicates these 5 segmentes for each further text element… oO

    emoji
  • LOL Now I get all p elements from all elements but only the first p element of each parent element

    emoji
  • Just a thought, but is it even possible to match more than one of the same subelement with Multilingual XML parser as to me the rule which defines which language is to be found in which subelement only "allows" one element per language.

    emoji
  •   

    is it even possible to match more than one of the same subelement with Multilingual XML parser as to me the rule which defines which language is to be found in which subelement only "allows" one element per language.

    It depends.  You can of course match more than one element, it all depends on your rule.  It's not intended to be a complete replacement from the xml filetype as that sort of flexibility around extracting whatever you like with a parser rule obviously won't eork, but for simple structures like yours it seem possible.  For example... using this as the "Languages Root":

    /survey/section/question[not(contains('|526|811|', concat('|', @number, '|')))]/*[self::headline or self::choices]/text

    And this as the language:

    p

    Seems to work... I added a couple of placeholders and used the embedded html for the ones you added:

     Screenshot showing the parsed segments using the multilingual XML filetype as a monolingual option.

    I didn't spend a lot of time on this so you'd need to check thoroughly if anything was missing, but it seems possible with this quick check.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Reply
  •   

    is it even possible to match more than one of the same subelement with Multilingual XML parser as to me the rule which defines which language is to be found in which subelement only "allows" one element per language.

    It depends.  You can of course match more than one element, it all depends on your rule.  It's not intended to be a complete replacement from the xml filetype as that sort of flexibility around extracting whatever you like with a parser rule obviously won't eork, but for simple structures like yours it seem possible.  For example... using this as the "Languages Root":

    /survey/section/question[not(contains('|526|811|', concat('|', @number, '|')))]/*[self::headline or self::choices]/text

    And this as the language:

    p

    Seems to work... I added a couple of placeholders and used the embedded html for the ones you added:

     Screenshot showing the parsed segments using the multilingual XML filetype as a monolingual option.

    I didn't spend a lot of time on this so you'd need to check thoroughly if anything was missing, but it seems possible with this quick check.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Children