Align Documents cannot align XLIFF 2.0 files

My client sent me 2 xliff files from Articulate. The one with Source and the one with Target. I need to create a TM from them for the new assignment.

I tried to align both files using Align function in Trados Studio but I got the error that XLIFF 2.0 is not supported. Then I created SDLFLIXX in a project and tried to align those files, but again it says  that XLIFF 2.0 is not supported.

My Trados verion: Trados Studio 2022 SR2 - 17.2.9.18688

Please, let me know how to proceed. The files are confidential, so I can send only a sample.

Thank you!
Sotir

emoji
Parents Reply Children
  •  

    Thank you for your attention!

    I am attachnig the sample files with 3 segments each:

    https://we.tl/t-O3Dnays2VY

    Meanwhile as we needed to start the project on Friday we did the following:
    1. Create sdlxliff files from both source (A) and target (B).

    2. Export for external review in bilingual DOCX.

    3. Copy-paste the text from the B file into the target column of the A file. 

    4. Manually correct the missallignments in the DOCX file.

    5. Import back the bilingual A file into Trados (it did not want to do it until we removed all tags in the target column and this was the pitfall of this process)

    6. We updated a specifically created TM as a reference.

    I will be happy to do the alignment in Trados next time. :)

    So your help is highly appreaiated!

    Best regards,

    Sotir

    emoji
  •  

    Thanks for the files.  I don't know exactly why this won't work so I will log this with support and we can create a bug as needed.  In the meantime, and in case it helps with some ideas going forward, I was playing around with OpenAI this evening and create a Python script that will sort this out.  This is what I did:

    1. opened the English source file in Studio as an en-bg project.  Copied source to target and saved the target file.
    2. run the script that asks for the bulgarian source file, then the en-bg xliff target I created (that contains only English in source and target)
    3. the script compares the IDs and if they are the same it puts the Bulgarian target into the en-bg file and saves an updated XLIFF as a new file

    The script is here in case you're interested:

    from lxml import etree
    import os
    
    def pretty_print_element(elem, level=0):
        # Function to add indentation and newlines to an XML element, recursively for all its children
        i = "\n" + level*"  "
        if len(elem):
            if not elem.text or not elem.text.strip():
                elem.text = i + "  "
            if not elem.tail or not elem.tail.strip():
                elem.tail = i
            for child in elem:
                pretty_print_element(child, level+1)
            if not child.tail or not child.tail.strip():
                child.tail = i
        else:
            if level and (not elem.tail or not elem.tail.strip()):
                elem.tail = i
    
    # User input for file paths
    first_file_path = input('Enter the path to the first XLIFF file: ')
    second_file_path = input('Enter the path to the second XLIFF file: ')
    output_file_path = os.path.splitext(second_file_path)[0] + '_merged.xliff'
    
    # Load the XML content of both files
    first_tree = etree.parse(first_file_path)
    second_tree = etree.parse(second_file_path)
    
    # Define the XML namespace
    ns = {'x': 'urn:oasis:names:tc:xliff:document:2.0'}
    
    # Get the root of the XML files
    first_root = first_tree.getroot()
    second_root = second_tree.getroot()
    
    # Iterate through each unit in the first file
    for first_unit in first_root.xpath('//x:file/x:unit', namespaces=ns):
        unit_id = first_unit.get('id')
        # Find the corresponding unit in the second file
        second_unit = second_root.xpath(f'//x:file/x:unit[@id="{unit_id}"]', namespaces=ns)
    
        if second_unit:
            # Get the target node, or create one if it doesn't exist
            target_node = second_unit[0].xpath('.//x:segment/x:target', namespaces=ns)
            if not target_node:
                segment_node = second_unit[0].find('.//x:segment', ns)
                target_node = etree.SubElement(segment_node, f'{{{ns["x"]}}}target')
            else:
                target_node = target_node[0]
                # Remove any existing content in the target node
                target_node.clear()
    
            # Get the source node from the first unit
            source_node = first_unit.xpath('.//x:segment/x:source', namespaces=ns)[0]
    
            # Copy all content from the source node to the target node
            target_node.text = source_node.text
            for element in source_node:
                target_node.append(element)
    
    # After updating the XML content but before writing it to a file
    for element in second_tree.xpath('//x:unit/x:segment/x:target', namespaces=ns):
        pretty_print_element(element)
    
    # Now write the updated and pretty-printed XML to a new file
    second_tree.write(output_file_path, xml_declaration=True, encoding='UTF-8', pretty_print=True)
    
    # Print a success message
    print(f"The XLIFF files have been merged and saved as: {output_file_path}")
    

    I ran it in the terminal of Visual Studio Code like this:

    Screenshot showing Visual Studio Code and the running of the Python script.

    Result and file was this:

    <?xml version='1.0' encoding='UTF-8'?>
    <xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" srcLang="en-GB" trgLang="bg-BG" version="2.0" xml:space="preserve">
      <file canResegment="no" id="Anti">
        <unit canResegment="no" id="6mC1hpNdo2N.Name" type="Articulate:PlainText">
          <segment>
            <source>Main Course</source>
            <target>Основен курс</target></segment>
        </unit>
        <unit canResegment="no" id="6T9xVpFJD5Y" type="Articulate:DocumentState">
          <originalData>
            <data id="generic_1"><Style Justification="Center" /></data>
            <data id="span_2"><Style FontSize="20.9454517" FontIsBold="False" /></data>
          </originalData>
          <segment>
            <source>
              <pc id="block_0">
                <ph dataRef="generic_1" id="generic_1"/>
                <pc dataRefStart="span_2" id="span_2">Create your personalised training today!</pc>
              </pc>
            </source>
            <target>
      <pc id="block_0">
        <ph dataRef="generic_1" id="generic_1"/>
        <pc dataRefStart="span_2" id="span_2">Създайте Вашето персонализирано обучение днес!</pc>
      </pc>
    </target>
    </segment>
        </unit>
        <unit canResegment="no" id="5eP0zufn63h" type="Articulate:DocumentState">
          <originalData>
            <data id="generic_1"><Style /></data>
            <data id="span_2"><Style FontFamily="Text TF Book" FontSize="10.4727259" FontIsBold="True" FontIsItalic="False" ForegroundColor="lt1,00" LinkColor="lt1,00" /></data>
          </originalData>
          <segment>
            <source>
              <pc id="block_0">
                <ph dataRef="generic_1" id="generic_1"/>
                <pc dataRefStart="span_2" id="span_2">START</pc>
              </pc>
            </source>
            <target>
      <pc id="block_0">
        <ph dataRef="generic_1" id="generic_1"/>
        <pc dataRefStart="span_2" id="span_2">НАЧАЛО</pc>
      </pc>
    </target>
    </segment>
        </unit>
      </file>
    </xliff>
    

    Which opens in Studio like this (tags fully expanded):

    Screenshot of the final updated XLIFF in Studio

    So now I can update into a TM.

    Interestingly after I gave up using Powershell as I could not quite get it right, it actually took about 10 mins to come up with the code and create the file.  So quite aneat solution I think for when things are not working as expected in Studio... and if you have more files like this to do probably a lot faster and more accurate too seeing as XLIFF maps neatly this sort of process and the IDs are checked.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Thank you,  !

    This solution will certainly help other users before Trados team checks it.

    I will try to test it later this week.

    emoji
  • Well, I tried but the script stopped at the first file and I do not know why:

    Screenshot of a Python script in Visual Studio Code with a focus on the terminal showing a file path input prompt for the first XLIFF file.

    I istalled VIsual Studio Code, then Python 3, then Ixml. I figured out that I need to put double slash in the path. Maybe I need something else?

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 1:29 PM (GMT 0) on 29 Feb 2024]
  •  

    I think you're trying to edit the script itself and all you actually did was cause the path to be printed out in the question that you put in the script.  Just run it as I provided it and enter the paths when prompted in the terminal window.  I would have recorded it last night but it was late and I needed to be quiet where I was ;-)  So just run it like this:

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Got it! Thank you! The script do the magic. Thank you for your time! Slight smile

    Now I have another issue though.

    The sample file opens just fine in Trados, but the real one gets this error:

    Screenshot of Trados Studio Task Results showing a completed scan with 0 errors and 2 warnings. Warnings include 'Xliff Version 2.0 is not supported' and an error related to 'DataReferenceStart' not referring to a valid Data element.

    Is there any option I can send you the original file privately, if you would like to take a look on it?

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 1:29 PM (GMT 0) on 29 Feb 2024]
  •  

    You can send it to pfilkin at sdl dotcom and when I get a little time I can take a look and see if I can find the problem.  I can't guarantee today or even this week... but I will take a look.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Whenever you have time for this. It is fine for me,  . Thank you and have a nice day!

    emoji
  •  

    ok - here's a revised script:

    from lxml import etree
    import os
    
    def pretty_print_element(elem, level=0):
        # Function to add indentation and newlines to an XML element, recursively for all its children
        i = "\n" + level*"  "
        if len(elem):
            if not elem.text or not elem.text.strip():
                elem.text = i + "  "
            if not elem.tail or not elem.tail.strip():
                elem.tail = i
            for child in elem:
                pretty_print_element(child, level+1)
            if not child.tail or not child.tail.strip():
                child.tail = i
        else:
            if level and (not elem.tail or not elem.tail.strip()):
                elem.tail = i
    
    # User input for file paths
    first_file_path = input('Enter the path to the first XLIFF file: ')
    second_file_path = input('Enter the path to the second XLIFF file: ')
    output_file_path = os.path.splitext(second_file_path)[0] + '_merged.xliff'
    
    # Load the XML content of both files
    first_tree = etree.parse(first_file_path)
    second_tree = etree.parse(second_file_path)
    
    # Define the XML namespace
    ns = {'x': 'urn:oasis:names:tc:xliff:document:2.0'}
    
    # Get the root of the XML files
    first_root = first_tree.getroot()
    second_root = second_tree.getroot()
    
    # Iterate through each unit in the first file
    for first_unit in first_root.xpath('//x:file/x:unit', namespaces=ns):
        unit_id = first_unit.get('id')
        # Find the corresponding unit in the second file
        second_unit = second_root.xpath(f'//x:file/x:unit[@id="{unit_id}"]', namespaces=ns)
    
        if second_unit:
            second_unit = second_unit[0]
            # Copy <originalData> section if it exists
            original_data = first_unit.find('.//x:originalData', ns)
            if original_data is not None:
                second_original_data = second_unit.find('.//x:originalData', ns)
                if second_original_data is None:
                    # If <originalData> does not exist in the second unit, create it
                    second_original_data = etree.SubElement(second_unit, f'{{{ns["x"]}}}originalData')
                # Copy all <data> elements
                for data in original_data:
                    if second_original_data.find(f'.//x:data[@id="{data.get("id")}"]', ns) is None:
                        # Only copy <data> if an element with the same id doesn't already exist
                        second_original_data.append(data)
    
            # Get the target node, or create one if it doesn't exist
            target_node = second_unit.xpath('.//x:segment/x:target', namespaces=ns)
            if not target_node:
                segment_node = second_unit.find('.//x:segment', ns)
                target_node = etree.SubElement(segment_node, f'{{{ns["x"]}}}target')
            else:
                target_node = target_node[0]
                # Remove any existing content in the target node
                target_node.clear()
    
            # Get the source node from the first unit
            source_node = first_unit.xpath('.//x:segment/x:source', namespaces=ns)[0]
    
            # Copy all content from the source node to the target node
            target_node.text = source_node.text
            for element in source_node:
                target_node.append(element)
    
    # After updating the XML content but before writing it to a file
    for element in second_tree.xpath('//x:unit/x:segment/x:target', namespaces=ns):
        pretty_print_element(element)
    
    # Now write the updated and pretty-printed XML to a new file
    second_tree.write(output_file_path, xml_declaration=True, encoding='UTF-8', pretty_print=True)
    
    # Print a success message
    print(f"The XLIFF files have been merged and saved as: {output_file_path}")
    

    The problem was related to the script not handling the <originalData> section and its <data> elements correctly and we ended up with that error.  Now it seems to work fine.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    I also got a workaround for this bug from the Support team this afternoon... very easy workaround:

    https://gateway.sdl.com/apex/communityknowledge?articleName=000021850

    So you have two mechanisms to solve this now.  Although I must admit I'm partial to the Python solution myself ;-)

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji