How to open these XLIFF files?

Dear Team,

I tried using the default XLIFF filter, but I keep getting various errors, the latest are "value cannot be null. parameter name culture" and "There is an error in XML document (1, 2)"

Any idea how I could make this work? Attached two versions of the same file, XLIFF 1.2 and 2.0.

Thank you in advance for your help!

Best regards,

Giannis

4403.Lina-video-subtitles-4190_2.0.zip

emoji
Parents
  •  

    I can open the 1.2 (although it does complain a bit) but not the 2.0.  So I ran it through an XLIFF Validation tool and receive the following:

    1.2
    Invalid attribute in <group>: course-page-layout.

    This indicates that the XLIFF 1.2 file you are trying to work with contains an invalid attribute "course-page-layout" within a <group> element, which is not allowed according to the XLIFF 1.2 specification.  You can check that here: http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#group

    I removed that attribute and also switched off the validation in the filetype settings and it opens without a complaint (apart from the language code which isn't really a problem):

    Screenshot showing the XLIFF 1.2 successfully opened in the Studio editor.

    2.0
    cvc-complex-type.3.2.2: Attribute 'datatype' is not allowed to appear in element 'file'.
    cvc-complex-type.3.2.2: Attribute 'date' is not allowed to appear in element 'file'.
    cvc-complex-type.4: Attribute 'id' must appear on element 'file'.

    This indicates that the XLIFF 2.0 file you are trying to work with contains attributes "datatype" and"date" within an element "file" that are not allowed according to the XLIFF schema rules. It is also missing the mandatory attribute "id".  You can check this here: http://docs.oasis-open.org/xliff/xliff-core/v2.0/xliff-core-v2.0.html#file

    I corrected these and then the message get too complex for me to type out!:

    Screenshot showing a c plex error message generated from validating the XLIFF file in a separate application.

    I removed the content of the notes elements but the messages continue to get more and more complex.  I think you have two options really:

    1. go back to your client and explain what problem you are having as they are producing invalid XLIFF 2.0, or
    2. use the Bilingual XML filetype and then you will be able to handle this with ease as it doesn't care quite so much about the XLIFF specification:
      Screenshot showing the XLIFF 2.0 file being opened in the editor view with the Mulitlingual XML filetype.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi again  

    The client came back to us with updated files.

    However, I still cannot open xliff 2.0 (error below) and xliff 1.2 only imports content from the first Group. Any idea how to fix this on my side or the client's side?

    Also, the XML filter I created (attached) seems fine, but I guess I need to convert a lot of imported content into tag. What do you think?

    Trados Studio Task Results window showing 3 warnings: 'Pre-Scanning failed to identify the file type', 'Pre-Scanning Error: Value cannot be null', and 'Pre-Scanning Error: Xliff Version 2.0 is not supported'.

    /cfs-file/__key/communityserver-discussions-components-files/90/Lina_2D00_video_2D00_subtitles_2D00_4212.zip

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:03 AM (GMT 0) on 29 Feb 2024]
  •   

    Well the XLIFF is still non-valid.  The only advice I could give to your client is to learn how to write valid XLIFF... or if it's being generated by some 3rd party tool perhaps review the code and make some changes to ensure it's doing this correctly.

    I know we can be quite strict with XLIFF so I also tested this one in another tool (memoQ) but it fails in there too and that tool is usually a bit more relaxed.  So I'm pretty sure that the problem here is with the XLIFF itself.

    I looked at your filetype settings... and note that you have created a monolingual XML filetype extracting the source.  That won't really help you as when you write back the target it will simply overwrite the content in the source element with the target translation.

    If you want to use a monolingual filetype for this then your process would be this:

    1. copy the source to target using a text editor... bit of regex magic
    2. create your filetype to extract the content of the target element, not the source
    3. use the embedded html filter to handle the tagging

    This is ok when you don't have any target content in the original source... as yo indeed don't with your sample files.  But if you have to do any review on the xliff files, or if you receive some partially translated then this approach won't work.  Hence the reason for me suggesting this:

    use the Bilingual XML filetype and then you will be able to handle this with ease as it doesn't care quite so much about the XLIFF specification

    I probably confused you there... apologies.  It's actually called the Multilingual XML filetype as you can see in the screenshot I also provided.  You can find it here: https://appstore.rws.com/Plugin/13

    So if I was you I'd run with the Multilingual Filetype for many reasons:

    1. it doesn't care about your crappy XLIFF
    2. it supports the embedded content filters
    3. it supports additional use of regex for tagging on top of the embedded content filters if needed

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Thank you so much,   

    I see this requires a bit of configuring. Will try to convince the client to send better files (not XLIFF at all, if possible).

    If they insist, I will try the multilingual XML, and may come back to you with some queries :)

    emoji
  •  

    So, the client insists on those XLIFFs. So, I tried configuring the multilingual XML filter, but I had few issues:

    - Even though I deleted all custom file types I had previously created, and selected my languages correctly, I wasn't able to make Sturio recognize either file as multilingual XML.
    - In this case, with source and target elements, would I need to change the target language every time?
    - I exported the sdlftsettings to send them to you, but when I tried to import them into another Studio instance, I got the error below.
    - Attached what I did so far, any thoughts?

    multiXliff.zip

    emoji
  • Hi ,

    Did you maybe take a look at this? I wasn't able to make the multilingual xml work for these XLIFFs.

    emoji
  •  

    Apologies for the delay... didn't have a lot of time for forum work in the last weeks.

    I took a look... the 2.0 works just fine with these settings:

    Trados Studio Language Mapping settings window showing Languages Root field with XPath query 'xlifffileunitsegment' and languages listed as English (United Kingdom) source and French (France) target.

    The languages root needs this:

    /xliff/file//unit/segment

    Note the double slash between file and unit which is needed because you have a bit of mess in this file with multiple levels of nesting for the group element.  This expression starts at the root (/xliff/file) and uses the // operator to search for any unit element at any level of nesting within group elements, and then selects the segment child element.

    I didn't test the other one.. I guess you only need one to work for you!

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:03 AM (GMT 0) on 29 Feb 2024]
  • Thank you,  

    I guess since the source has just <source> and <target>, I would need to change the target language every time, right?

    Also, I tried converting <div> <p> <br> etc. into tags. First, I disabled entity conversion, and created placeholder &lt;\w+&gt;

    However, this did not convert the content of &lt; and &gt; into tags, just them separately.

    Any thoughts? I thought the HTML embedded content would help with this, but it didn't.

    Screenshot of Trados Studio showing a comparison of source and target text with HTML tags visible. No errors or warnings are indicated.

    Screenshot of Trados Studio with highlighted differences between source and target text. The target text contains HTML tags and placeholders. No errors or warnings are shown.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:03 AM (GMT 0) on 29 Feb 2024]
  •  

    I guess since the source has just <source> and <target>, I would need to change the target language every time, right?

    Right.

    I thought the HTML embedded content would help with this, but it didn't.

    It should do... perhaps you didn't apply it correctly?  I created a quick video to explain so you can see it works and how I did it:

    I hope this helps.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi  

    Thank you so much for your detailed instructions! Actually, I was using the same settings (all embedded HTML), but it still didn't work in Studio 2021. Then I did the same process in Studio 2022, and it was finally fine!

    One issue still remains: the client had asked for whatever reason to preserve the elements below in the target, but my export has completely removed them.
    Is there a way to maintain them?

    Screenshot of Trados Studio showing a comparison of source and target text. The source text is highlighted with a red box indicating an error in the segment.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:04 AM (GMT 0) on 29 Feb 2024]
Reply Children
  •  

    It's not missing at all, it's just been converted to what it represents, a carriage return.  The HTML entity &#13; represents the ASCII control character Carriage Return (CR).  In ASCII, the Carriage Return character is defined as decimal 13 (or hexadecimal 0D), which is why &#13; is used to represent it in HTML.

    So you have this:

    Screenshot of Trados Studio showing HTML source code with carriage return entities represented as 
 followed by the number 10 and another carriage return.

    Which is two carriage returns, the number 10 and another carriage return.  Your target file shows this:

    Screenshot of Trados Studio displaying the target file output with visible line breaks and the number 10 between them.

    Looks remarkably like two carriable returns, the number 10 and a carriage return.

    If I open the file with the filetype I created in the video and look at that part of the file using the "All content" display filter I can also see them there:

    Screenshot of Trados Studio's 'All content' display filter showing two carriage returns, the number 10, and another carriage return in the file.

    Unfortunately I don't think you are going to be able to address this within the filetype because Studio automatically handles them and doesn't allow you to have any control over how the whitespace like this is managed.

    The multilingual filetype at least retains then in the source, compared to the xml filetype which will not.  I think the best solution is to ask your customer to simply use whitespace and then you can tell Studio to preserve the whitespace.  Using entity values is really problematic and most XML processors (especially .NET) will automatically encode the &#13; entities as carriage returns when writing XML.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:04 AM (GMT 0) on 29 Feb 2024]