Custom xml filetype: filter by attribue value (specific IDs) of parent element

Hi ,

I got a specifc urgent question. I got a survey xml from my client. I created a custome xml filetype for it to grab all translatable strings, but I also got a file with questions (IDs) to exlude. Is it possible to filter these via filetype?

The inline elements to be translated are <p> and <computed> and they are nested within several parent elements of which the main one is <question number="x" type="xy">.

So my question is, it is possible to set up a filter to say only show <p> and <computed> for translation if <question …> attribute number is not any of listed ones (e.g. (regex) (258|259|260|1130|2255))?

Or maybe add a new element <exclude> around them, so they won't be listed for translation (I would then have to remove these before returning the translation)?

Best regards,

Pascal



better formulation of question while using correct terminology
[edited by: Pascal Zotto at 1:49 PM (GMT 1) on 15 May 2023]
emoji
  •  

    Always helps to have a sample to play with.  But perhaps this will help.  I created this sample and the expressions with the help of ChatGPT:

    <quiz>
        <question number="258" type="MCQ">
            <content>
                <p>What is the name of the fire-breathing creature in Greek mythology?</p>
                <computed>Hint: It has the body of a lion and the tail of a serpent.</computed>
            </content>
        </question>
        <question number="359" type="TF">
            <content>
                <p>Is the Phoenix a creature that is reborn from its own ashes?</p>
                <computed>True or False?</computed>
            </content>
        </question>
        <question number="460" type="MCQ">
            <content>
                <p>Which creature in Norse mythology is known as the world serpent?</p>
                <computed>Hint: Its name starts with a 'J'.</computed>
            </content>
        </question>
        <question number="561" type="MCQ">
            <content>
                <p>What is the name of the one-eyed giants in Greek mythology?</p>
                <computed>Hint: It starts with 'C'.</computed>
            </content>
        </question>
        <question number="662" type="TF">
            <content>
                <p>Are Unicorns considered mythical creatures in every culture?</p>
                <computed>True or False?</computed>
            </content>
        </question>
        <question number="763" type="MCQ">
            <content>
                <p>What is the name of the multi-headed dog guarding the underworld in Greek mythology?</p>
                <computed>Hint: It starts with 'C'.</computed>
            </content>
        </question>
        <question number="864" type="MCQ">
            <content>
                <p>Which creature in Chinese mythology is known for its power over water?</p>
                <computed>Hint: It's a dragon.</computed>
            </content>
        </question>
        <question number="965" type="TF">
            <content>
                <p>Is the Kraken a legendary sea monster of gigantic size in Scandinavian folklore?</p>
                <computed>True or False?</computed>
            </content>
        </question>
        <question number="1066" type="MCQ">
            <content>
                <p>What is the name of the bird in Egyptian mythology that symbolizes the sun, creation, and rebirth?</p>
                <computed>Hint: It starts with 'B'.</computed>
            </content>
        </question>
        <question number="1167" type="TF">
            <content>
                <p>Is Bigfoot considered a mythical creature?</p>
                <computed>True or False?</computed>
            </content>
        </question>
        <question number="1268" type="MCQ">
            <content>
                <p>Which creature in Japanese mythology is a turtle-like creature often depicted with a tail and long neck?</p>
                <computed>Hint: It starts with 'K'.</computed>
            </content>
        </question>
        <question number="1369" type="MCQ">
            <content>
                <p>What is the name of the half-man, half-horse creatures in Greek mythology?</p>
                <computed>Hint: It starts with 'C'.</computed>
            </content>
        </question>
    <question number="1470" type="TF">
        <content>
            <p>Is Medusa a mythical creature with snakes for hair in Greek mythology?</p>
            <computed>True or False?</computed>
        </content>
    </question>
    <question number="1571" type="MCQ">
        <content>
            <p>What is the name of the legendary creature in Irish folklore known for its shape-shifting abilities?</p>
            <computed>Hint: It starts with 'C'.</computed>
        </content>
    </question>
    <question number="1672" type="MCQ">
        <content>
            <p>Which creature in Hindu mythology is depicted as a large serpent that surrounds the world?</p>
            <computed>Hint: It starts with 'S'.</computed>
        </content>
    </question>
    <question number="1773" type="TF">
        <content>
            <p>Are Goblins considered mythical creatures that are mischievous and troublemakers?</p>
            <computed>True or False?</computed>
        </content>
    </question>
    <question number="1874" type="MCQ">
        <content>
            <p>What is the name of the half-man, half-goat creatures in Greek mythology?</p>
            <computed>Hint: It starts with 'S'.</computed>
        </content>
    </question>
    <question number="1975" type="TF">
        <content>
            <p>Is the Loch Ness Monster a mythical creature believed to inhabit the waters of Loch Ness in Scotland?</p>
            <computed>True or False?</computed>
        </content>
    </question>
    <question number="2076" type="MCQ">
        <content>
            <p>Which mythical creature is known for luring sailors to their doom with their enchanting voices in Greek mythology?</p>
            <computed>Hint: It starts with 'S'.</computed>
        </content>
    </question>
    <question number="2177" type="TF">
        <content>
            <p>Is the Griffin a mythical creature with the body of a lion and the head of an eagle?</p>
            <computed>True or False?</computed>
        </content>
    </question>
    <question number="2278" type="MCQ">
        <content>
            <p>What is the name of the mythical creature in Slavic folklore known for its ability to control the weather and bring storms?</p>
            <computed>Hint: It starts with 'B'.</computed>
        </content>
    </question>
    <question number="2255" type="TF">
        <content>
            <p>Is the Manticore a mythical creature with the body of a lion, a human head, and a scorpion tail?</p>
            <computed>True or False?</computed>
        </content>
    </question>
    </quiz>

    The XPath expression to select the <p> and <computed> elements for translation, except for the ones whose parent <question> has an attribute number equal to 561, 763, 1066, or 1268 would be this for example:

    //question[not(@number=561 or @number=763 or @number=1066 or @number=1268)]/content/*[self::p or self::computed]

    1. //question: It selects all <question> elements in the XML document.
    2. [not(@number=561 or @number=763 or @number=1066 or @number=1268)]: Filters the <question> elements that don't have a number attribute equal to 561, 763, 1066, or 1268.
    3. /content: Selects the <content> child element of the filtered <question> elements.
    4. /*[self::p or self::computed]: Selects the child elements of the <content> elements that are either <p> or <computed>.

    I don't think regex will work for your use case.  But to try and make it easier to add lists of exclusions I pressed ChatGPT for a better answer:

    //question[not(contains('|561|763|1066|1268|', concat('|', @number, '|')))]/content/*[self::p or self::computed]

    1. //question: It selects all <question> elements in the XML document.
    2. not(contains('|561|763|1066|1268|', concat('|', @number, '|'))):
      • concat('|', @number, '|'): Concatenates the current number attribute value with pipe characters | on both sides.
      • contains('|561|763|1066|1268|', ...): Checks if the concatenated string is present in the list of exclusions (the pipe-separated string).
      • not(...): Filters the <question> elements that don't match the exclusions.
    3. /content: Selects the <content> child element of the filtered <question> elements.
    4. /*[self::p or self::computed]: Selects the child elements of the <content> elements that are either <p> or <computed>.

    To add more exclusions, simply append them to the pipe-separated list, like |561|763|1066|1268|NewExclusion|.

    That may not be exactly what you were after as you didn't provide a sample file, but perhaps it'll help you to handle the file you have?  But both expressions worked well for me in Trados Studio with just two rules... for example;

    Trados Studio parser rule configuration window with an XPath expression input to filter specific question elements for translation.

    Preview of an XML file in Trados Studio showing questions about mythical creatures with corresponding hints.



    Generated Image Alt-Text
    [edited by: Trados AI at 11:07 AM (GMT 0) on 29 Feb 2024]
  • Hi ,

    yes, sorry, I was completely in a hassle yesterday and forgot to add a sample and my filetype settings I have so far. I thought of these 2 hours ago.

    This is my first try to build such a filetype setting from scratch.

    Apparently we used different approaches to create the filetype settings as I don't get any results at all, when I add the rules to the parser. :( I have Trados 2022. I tried 2 filetype settings but don't get the correct result...

    Not sure what I did wrong when using the wizard to create the settings.2 xml file type settings.zip

    emoji
  •   

    It seems that there isn't a sample file available yet. I'm guessing you might be testing with the one I created? Here are my observations:

    emoji
  • Hi ,

    Thanks so far, that already helped a lot.

    The xml Survey was a creation while using the settings like in a video you uploaded on Youtube about problematic xml files where you select the option where Trados automatically finds all elements by selecting the xml to translate. That's the filetype setting  I’m working with right now as at least I got all translatable text in Trados to work on and I manually locked all questions to be excluded. ;)

    The second one was using the option for xpath. I had used it against my file for testing therefore I have different numbers there. These were the first numbers in the file so it would be easier to spot them although there still are a few hundred lines before, which is fine for translation.

    The root element should have been xpath rule ;) The root element in my file is called "survey". Either way, it was an error and might explain why I didn't get any results as of course it could not find that root element.

    OK, I now get some results that are pretty close to the mxliff I got from the client (can't work on that as it's too large and takes about 10 seconds for every single segment confirmation) I just have to check whether all the exclusions work well. But I’ll have to do that later. (I miss a search function in the file preview to check for specific text ^^)

    Regarding the sample file, I would need to create one based on original file first but I’ll have to replace all text with dummies due to NDA issues. Hopefully I can do that tonight.

    emoji
  • Hi ,

    Here comes the obfuscated example xml. I replaced everything with 'xxx xxx' and those not to translate with 'not this one' so they're easy to track. When testing the rules we have so far it didn't seem to work though.

    xml example.zip

    emoji
  •  

    ok - using this file I did two things.  First I used this rule to see how many should be extracted with no exclusions:

    //question/*[self::headline or self::choices]/text/p

    This gave me 57 segments including those with "not this one" in the <p> elements.

    Then I used this rule:

    //question[not(contains('|526|811|', concat('|', @number, '|')))]/*[self::headline or self::choices]/text/p

    Now I get 44 segments as I excluded the 3 here:

    Trados Studio preview window showing XML file content with segments 'not this one' highlighted to be excluded from translation.

    and the 10 here:

    Trados Studio preview window displaying multiple lines of XML file content with several 'not this one' segments highlighted for exclusion.

    So I think that solves it based on my understanding so far:

    Two rules:

    Trados Studio parser rule configuration window with a rule set to exclude specific segments from translation using contains() and concat() functions.

    Always translatable
    //question[not(contains('|526|811|', concat('|', @number, '|')))]/*[self::headline or self::choices]/text/p

    Not translatable
    //*

    Let me know if that works for you too?  Using the contains() and concat() functions should make it simple to manage if you have a lot of exclusions.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:07 AM (GMT 0) on 29 Feb 2024]
  • Hi ,

    OK, I guess I’m too stupid for this … ;) tried these rules on the whole file and again, nothing was filtered out. :(

    And which options do I have to use in order to define embedded content (tags) with this filetype setting? I can't get this working either.

    emoji
  •  

    I doubt that!  Here's a video that might help you to see what you are doing that I didn't... or vice versa.

    emoji
  • Hi ,

    I'll have a closer look at this tonight at home.

    Regarding the tags: The problem is that I don’t need the standard tags to be converted, that would be too easy. ;) Besides possible HTML, I have variables that need to transformed into tags (e.g. text_moretext) or other patterns which I could easily convert using regex, but I don't know which option to use in Embedded Content Processing to get the regex run on the displayed text. Is it Defined by parser rules or defined by Document structure information and if it's the latter, which suboption do I need to chose (too many possibilties in the dropdown)?

    emoji
  •  

    Then you have three ways to tackle it.

    Way #1

    Add the structure you assigned to your parser rule (you probably have t do that as we didn't use it yet). 

    Trados Studio Edit Rule window showing XPath and Structure Information Properties with a paragraph rule highlighted.

    Then add the structure here in 1.

    Embedded Content settings in Trados Studio with Document Structure Information selected and a configure button highlighted.

    Finally configure the rules to apply to the structure:

    Trados Studio window for creating regex rules for legacy embedded content with a configure button highlighted.

    Way #2

    Use the plain text processor for the embedded content instead:

    Embedded Content tab in Trados Studio showing parser rule for Embedded Content Plain Text processor.

    Then create you rules in the plain text embedded processor.

    Way #3

    Use the Multilingual XML filetype and select the "Treat as Monolingual" option:

    Multilingual XML options in Trados Studio with 'Treat as monolingual' checkbox highlighted.

    Now you can use the html embedded content processor AND add regex rules for your variables:

    Embedded Content settings in Trados Studio with Html Embedded Content Processor selected.

    Placeholders tab in Trados Studio showing different placeholder patterns and their descriptions.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:08 AM (GMT 0) on 29 Feb 2024]