Problem with xliff 2.0 filetype

Question

I have a problem when importing a file with the xliff 2.0 filetype in Studio 2021. 
 The file itself is fine. The import in memoq for example works absolutely fine. 
 
 in Studio, there is a problem with the segmentation. All sentences in one segment: 
 
 I guess this is a bug. Any comments here?

Paul · Accepted Answer

Sarah Rauber 
 ok - the updated version is live on the appstore as of yesterday, and handles this scenario properly: 
 
 My initial comments on the use of XLIFF 2.0 are still valid, but using the Multilingual XML filetype will allow you to ignore the specification and just handle the file as you'd like.

Paul · Answer

Sarah Rauber 
 I think the only way to answer this question is to see the file. Can you share it, or at least cut it down in a text editor and share the file with only a few segments that behave badly for you? This way you could also anonymise the text if necessary.

Paul · Answer

Hi Sarah Rauber 
 Thanks for the file. I think this is a bug so I have logged a support case ( 00654964 ) for it. 
 I also tried to work around this using the Multilingual XML filetype and this works to resolve the segmentation problem you have shown here. But then introduces another segmentation problem because we cannot segment the CDATA sections using the embedded content filter and this is a current limitation of the API. 
 So unfortunately I think we're stuck until these problems are resolved by the core development team. I'll come back to you if I learn that I'm mistaken and the problem can be resolved, or once I know the bug number so we can track progress. 
 One possible solution, although a bit trickier, would be this: 
 
 use regex to add a target element that contains the content of your source element 
 create a new custom XML filetype that handles the target element only 
 
 When you translate and save the target you'll have a fully translated bilingual XLIFF 2.1 file.

Paul · Answer

Sarah Rauber

It seems that the behaviour is actually considered to be correct. The technical support explained this quite well and I worked through the steps using your file again below.

After some discussion about the expected behavior for segmentation on the XLIFF 2.0 files, we agreed that what the standard says is that agents should resegment, unless the canResegment attribute says otherwise:

http://docs.oasis-open.org/xliff/xliff-core/v2.0/xliff-core-v2.0.html#canResegment

Since the "canResegment" attribute will be treated hierarchically, all the segments in a <unit> must have it set to "yes". By default, it is set to "yes". So once a segment has it as "no", segmentation process doesn't take place.

Because of this merging, in order to avoid concatenation of neighboring segments, it is recommended each segment to end with whitespace.

What this means is that Trados will "resegment" all of the content inside the <unit> elements (so all <segment> elements) if the "canResegment" attribute is set to "yes" or is missing (the default is set to "yes"). In your file it's missing.

(STEP1) In this specific example, Trados merges all of the content into two Translation units:

no.1 -> Ich bin der Titel des Formulars.
no.2 -> http://www.google.de[Neues Fenster]http://www.google.de[Neues Fenster]Ich bin die Seitenüberschrift.Ich bin der Intro-Text einer Seite. Ich bin optional. Ich kann HTML-Markup und Links enthalten und aus mehreren Absätzen bestehen. Auf dieser Seite befinden sich nur einfache Textfelder. Die verschiedenen Intro-/Outro-Texte und die verschiedenen Beschriftungspositionen gibt es bei allen Feldtypen.

(STEP2) Then the default language processing rule (or a TM if you have one), and the segmentation rules defined there (ie: sentence-based segmentation) kick in. So the 2nd "merged" TU is split in to smaller segments according to the TM segmentation; so you end up having in the Editor:

no.1 -> Ich bin der Titel des Formulars.
no.2 -> http://www.google.de[Neues Fenster]http://www.google.de[Neues Fenster]Ich bin die Seitenüberschrift.Ich bin der Intro-Text einer Seite. (because the whitespace is missing after the word "Seitenüberschrift.", segmentation doesn't occur here \ also specified in the documentation provided above)
no.3 -> Ich bin optional.
no.4 -> Ich kann HTML-Markup und Links enthalten und aus mehreren Absätzen bestehen.
no.5 -> Auf dieser Seite befinden sich nur einfache Textfelder.
no.6 -> Die verschiedenen Intro-/Outro-Texte und die verschiedenen Beschriftungspositionen gibt es bei allen Feldtypen.

Like this:

Screenshot of Trados Studio showing merged content into two translation units with highlighted HTML markup tags and links.

If the "canResegment" attribute is added to the file and its value is set to "no", then the filter won't merge the TU "segment" elements. So I edited the <file> element like this:

<file id="file" canResegment="no" >

STEP1 will no longer occur, so you'll see:

no.1 -> Ich bin der Titel des Formulars.
no.2 -> http://www.google.de
no.3 -> [Neues Fenster]
no.4 -> http://www.google.de
no.5 -> [Neues Fenster]
no.6 -> Ich bin die Seitenüberschrift.
no.7 -> Ich bin der Intro-Text einer Seite. Ich bin optional. Ich kann HTML-Markup und Links enthalten und aus mehreren Absätzen bestehen. Auf dieser Seite befinden sich nur einfache Textfelder. Die verschiedenen Intro-/Outro-Texte und die verschiedenen Beschriftungspositionen gibt es bei allen Feldtypen.

Like this:

Screenshot of Trados Studio displaying content segmented into multiple translation units with 'canResegment' attribute set to 'no'.

But now you've lost the segmentation you had in the first place that would be handled by the TM and have a large chunk of text in one segment... number 7. So each segment in the XIFF file is treated as a complete segment and not segmented at all.

A potential solution here is to make sure that the translatable content in each <segment> element ends in a whitespace to avoid concatenation but this will not fix the "segment merging" of the content that doesn't end with a full stop (or other punctuation mark that enforces segmentation) - but it would still look better in the editor because it will be separated by space, not "glued" together.

So, I then decided to take a look at how memoQ would handle these two files seeing as you mentioned it. I created a view for both my test files (your original and the one I added canResegment="no" to) and see this:

Screenshot of Trados Studio editor with content listed in numbered segments, including URLs and HTML markup, without visible errors or warnings.

Both examples are handled exactly the same way, whether I use canResegment or not. It appears that they treat the XLIFF as canResegment="no" as the second example I showed above in Trados Studio. I believe this is actually incorrect because they have ignored the specification defaults for segmentation.

I hope this helps clarify this situation. There is no bug.

Paul Filkin | RWS Group

________________________
Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub

Paul · Answer

Sarah Rauber 
 In the meantime!! 
 We have been working on improving the ability of the Multilingual XML filetype to segment some of these tricky areas and now I can do this: 
 
 I can do this without caring about the rules for XLIFF 2.0 at all and I now get the perfect solution for you, better than any of the workarounds and better than memoQ ;-) We are still wrapping up some testing before we release to the appstore but this is a really neat solution I think.

Paul · Answer

Sarah Rauber 
 Sarah Rauber said: Is there any settings I have to do? 
 Of course. This is a very flexible filetype that allows all kinds of multilingual filetypes to handled. Here's what I used for your file:

I think that will do it for the sample you provided.

Trados Studio > 1. Trados Studio

Problem with xliff 2.0 filetype

Top Replies