Issue with segmentation in XML files in Trados 2021

Hello,

I have an issue with segmentation of XML files that I'm hoping someone can help me with. We work with two different types of XML files whose content is separated into units, such as title, description etc. We have to use an xliff 2.0. filter for these files. When working with the first type of files, the segmentation is ok, every translation unit is separated into a different segment:

Screenshot of Trados Studio showing XML file content with proper segmentation. Each segment contains 'Text for translation' with an emoji in one segment.

But, when we work with the second type of files, the title and the body of the separate units get combined into one segment:

Screenshot of Trados Studio displaying XML file content with incorrect segmentation. Title and description texts are combined into single segments.

I can see that the structure of the two files is different, but does anyone know how I can work around this with segmentation rules or other tools to make sure the Title is imported into a separate segment?

I am sending both documents attached. 

This is the first one that works fine:

<?xml version="1.0"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en-GB">
<file id="MR-A4D68FEB">
<group id="header">
<unit id="title">
<notes>
<note>The title</note>
</notes>
<segment>
<source>Text for translation</source>
</segment>
</unit>
<unit id="description">
<notes>
<note>The description</note>
</notes>
<segment>
<source>Text for translation

&#128161;The segment that contains the emoji.</source>
</segment>
</unit>
<unit id="keywords">
<notes>
<note>The keywords</note>
</notes>
<segment>
<source></source>
</segment>
</unit>
</group>
<group id="steps">
<unit id="s1" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment>
<source>Text for translation.</source>
</segment>
</unit>
<unit id="s2" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment>
<source>Text for translation. 

Text for translation. 

Text for translation.</source>
</segment>
</unit>
<unit id="s3" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment>
<source>Text for translation.</source>
</segment>
</unit>
<unit id="s4" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment>
<source>Text for translation.

Text for translation.</source>
</segment>
</unit>
</group>
<group id="ingredients">
<unit id="i1" translate="no" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="singular">
<source>Text for translation</source>
</segment>
<segment id="plural">
<source>Text for translation</source>
</segment>
</unit>
<unit id="i2" translate="no" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="singular">
<source>Text for translation</source>
</segment>
<segment id="plural">
<source>Text for translation</source>
</segment>
</unit>
<unit id="i3" translate="no" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="singular">
<source>Text for translation</source>
</segment>
<segment id="plural">
<source>Text for translation</source>
</segment>
</unit>
<unit id="i4" translate="no" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="singular">
<source>Text for translation</source>
</segment>
<segment id="plural">
<source>Text for translation</source>
</segment>
</unit>
<unit id="i5" translate="no" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="singular">
<source>Text for translation</source>
</segment>
<segment id="plural">
<source>Text for translation</source>
</segment>
</unit>
<unit id="i6" translate="no" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="singular">
<source>Text for translation</source>
</segment>
<segment id="plural">
<source>Text for translation</source>
</segment>
</unit>
<unit id="i7" translate="no" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="singular">
<source>Text for translation</source>
</segment>
<segment id="plural">
<source>Text for translation</source>
</segment>
</unit>
<unit id="i8" translate="no" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="singular">
<source>Text for translation</source>
</segment>
<segment id="plural">
<source>Text for translation</source>
</segment>
</unit>
</group>
</file>
</xliff>

And this is the second one that we are having problems with:

<?xml version="1.0"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en-GB">
<file id="MT-E3AD2ADB">
<group id="header">
<unit id="title">
<notes>
<note>The title</note>
</notes>
<segment>
<source>Title</source>
</segment>
</unit>
<unit id="description">
<notes>
<note>The description</note>
</notes>
<segment>
<source>Text for translation</source>
</segment>
</unit>
</group>
<group id="steps">
<unit id="s1" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="title">
<source>Title</source>
</segment>
<segment id="description">
<source>Text for translation. 

Text for translation.  </source>
</segment>
</unit>
<unit id="s2" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="title">
<source>Title</source>
</segment>
<segment id="description">
<source>Text for translation. 

Text for translation. </source>
</segment>
</unit>
<unit id="s3" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="title">
<source>Title</source>
</segment>
<segment id="description">
<source>Text for translation.  

Text for translation. </source>
</segment>
</unit>
<unit id="s4" name="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx">
<segment id="title">
<source>Title?</source>
</segment>
<segment id="description">
<source>Text for translation. </source>
</segment>
</unit>
</group>
</file>
</xliff>



Generated Image Alt-Text
[edited by: RWS Community AI at 8:17 AM (GMT 0) on 15 Nov 2024]
  •   

    Thanks for separating your post.

    The problem here is that the default behaviour for XLIFF 2.0 in Trados Studio is to resegment the source elements when there are multiple source elements in a unit element.  In XLIFF 2.0, a unit element represents a single translation unit, which may consist of multiple segment elements, each representing a different part of the same text segment. When the XLIFF file is parsed, the content of the source elements within each segment element is combined to form the full source text of the translation unit.

    If you want to avoid this you either need to make use of the canResegment attribute or separate the source elements so you have one source element per unit.  To use the reSegment attribute you would do something like this:

    <?xml version="1.0"?>
    <xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0"
    srcLang="en-GB">
      <file id="MT-E3AD2ADB">
        <group id="header">
          <unit id="s4">
            <segment id="title" canResegment="no">
              <source>Title?</source>
            </segment>
            <segment id="description">
              <source>Text for translation.</source>
            </segment>
          </unit>
        </group>
      </file>
    </xliff>
    

    Then you would see this when you open the file in Studio:

    Trados Studio preview window showing two segments, 'Title?' and 'Text for translation.' with 'U+' indicating a single unit without resegmentation.

    Note the "U+" on the right showing you this is still one Unit but it has not been reSegmented.  I'm not aware of another CAT that pays attention to this sort of detail but as with many of the filetypes in Studio the filters team do work hard to follow the specification and this is valuable for companies who also use the standards to full effect.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub



    Generated Image Alt-Text
    [edited by: Trados AI at 10:04 AM (GMT 0) on 29 Feb 2024]
  • I see. So we need to resolve this issue at the level of the file, it cannot be worked around in Trados.

    I managed to recreate this using the canResegment attribute, so I'll see how we can work with that. Thank you so much! :)

    emoji