Using semi colons as a delimiter for xlf files

Hi all -

I recently recevied an .xlf source file that needed translating.

There are many semi colons used within the file to separate values but when importing into Trados, the semi colon isn't being used as a delimiter to separate the text out into different segments, and I want it to be.

(This of course would make it easier for our translators to read, and means that we can update our translation memories with shorter strings that make sense. Once the file has been translated, the target translation would then need to put things back in the same format in which we received the file.)

I tried creating a rule in Trados (2019) but didn't have any luck. Has anyone created a rule like this before and had it work? I had opened the Translation Memory that I was going to use, and went to Language Resources > Segmentation Rules > Edit. I added a new rule based on an existing one but it still didn't work and I'm wondering if there is something obvious I'm missing.

Thanks!

emoji
Parents
  • I added a new rule with the following values: 

    Trados Studio Edit Segmentation Rule dialog box showing 'Semi colon rule' with 'Before break' set to 'Anything', 'Break characters' set to a colon, and 'After break' set to 'Whitespace (including spaces)'. 'Include closing punctuation' is checked.

    Advanced view default values as follows: 

    Trados Studio Edit Segmentation Rule dialog box in Advanced View with a regular expression entered for 'Before break' and 'After break' set to a whitespace character. The description 'Semi colon rule' is visible.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:38 AM (GMT 0) on 29 Feb 2024]
  •  

    ok - so I created an XLIFF 1.2 sample like this:

    <?xml version="1.0" encoding="utf-8"?>
    <xliff version="1.2" >
      <file original='somefile.txt' source-language="en-GB" target-language="de-DE" datatype='plaintext' >
        <header>
          <note>simple xliff</note>
        </header>
        <body>
         <trans-unit id="1">
          <source>Apple; Banana; Cherry; Date; Elderberry; Fig; Grapefruit; Honeydew; Indian plum; Jackfruit; Kiwi; Lemon; Mango; Nectarine; Orange; Papaya; Quince; Raspberry; Strawberry; Tangerine; Ugli fruit; Valencia orange; Watermelon; Xigua; Yellow passionfruit; Zucchini (also a fruit, botanically).</source>
          <target></target>
         </trans-unit>
         <trans-unit id="2">
          <source>1967 Shelby GT500; 1957 Chevrolet Bel Air; 1961 Jaguar E-Type; 1963 Aston Martin DB5; 1957 Mercedes 300SL Gullwing; 1964 Porsche 911; 1966 Ferrari 275 GTB; 1969 Dodge Charger; 1955 Ford Thunderbird; 1963 Chevrolet Corvette Sting Ray; 1968 Lamborghini Miura; 1965 Ford Mustang; 1973 BMW 3.0 CSL; 1958 Cadillac Eldorado Biarritz; 1967 Alfa Romeo Spider Duetto.</source>
          <target></target>
         </trans-unit>
         <trans-unit id="3">
          <source>Affenpinscher; Azawakh; Bolognese; Chinook; Dogo Argentino; Finnish Spitz; Glen of Imaal Terrier; Hovawart; Kai Ken; Lagotto Romagnolo; Mudi; Norwegian Lundehund; Otterhound; Peruvian Inca Orchid; Portuguese Podengo Pequeno; Russian Toy; Schipperke; Thai Ridgeback; Utonagan; Volpino Italiano; Wirehaired Vizsla; Xoloitzcuintli; Yakutian Laika; Zuchon.</source>
          <target></target>
         </trans-unit>
         <trans-unit id="4">
          <source>Great white shark; tiger shark; bull shark; hammerhead shark; blue shark; mako shark; nurse shark; lemon shark; blacktip shark; Caribbean reef shark; zebra shark; whale shark; basking shark; thresher shark; grey reef shark; sand tiger shark; spiny dogfish; porbeagle shark; white-tip reef shark; wobbegong; goblin shark; cookie-cutter shark; silky shark; Galapagos shark; angel shark.</source>
          <target></target>
         </trans-unit>
        </body>
      </file>
    </xliff>

    I used this rule:

    Screenshot showing a semicolon break rule in the segmentation settings for the TM

    This results in this:

    Screenshot showing the result of a segmentation rule to segment on the semi-colon

    So the theory seems to work.  Your rule is different to mine... but perhaps your source file is formatted differently.  So a few questions:

    1. please provide a small example of how your file is formatted
    2. if you provide a complete file we will be able to see if it's XLIFF 1.2 or 2.0, but if you don't then which XLIFF is it?
    3. did you make the changes to your TM before you opened the file in a new project or after the project was already created?
    4. are you sure you are opening the file against the TM you create the rules for?

    Also, for interest... you may also be able to solve this even more easily by using the embedded content processor.  So I created a placeholder rule like this and set it to "Exclude" in the advanced settings:

    Screenshot showing a placeholder rule using the semi-colon in the XLIFF filetype settings

    This results in an even cleaner and easier approach for my sample file:

    Screenshot showing the results of the placeholder rule in the filetype preview feature.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Reply
  •  

    ok - so I created an XLIFF 1.2 sample like this:

    <?xml version="1.0" encoding="utf-8"?>
    <xliff version="1.2" >
      <file original='somefile.txt' source-language="en-GB" target-language="de-DE" datatype='plaintext' >
        <header>
          <note>simple xliff</note>
        </header>
        <body>
         <trans-unit id="1">
          <source>Apple; Banana; Cherry; Date; Elderberry; Fig; Grapefruit; Honeydew; Indian plum; Jackfruit; Kiwi; Lemon; Mango; Nectarine; Orange; Papaya; Quince; Raspberry; Strawberry; Tangerine; Ugli fruit; Valencia orange; Watermelon; Xigua; Yellow passionfruit; Zucchini (also a fruit, botanically).</source>
          <target></target>
         </trans-unit>
         <trans-unit id="2">
          <source>1967 Shelby GT500; 1957 Chevrolet Bel Air; 1961 Jaguar E-Type; 1963 Aston Martin DB5; 1957 Mercedes 300SL Gullwing; 1964 Porsche 911; 1966 Ferrari 275 GTB; 1969 Dodge Charger; 1955 Ford Thunderbird; 1963 Chevrolet Corvette Sting Ray; 1968 Lamborghini Miura; 1965 Ford Mustang; 1973 BMW 3.0 CSL; 1958 Cadillac Eldorado Biarritz; 1967 Alfa Romeo Spider Duetto.</source>
          <target></target>
         </trans-unit>
         <trans-unit id="3">
          <source>Affenpinscher; Azawakh; Bolognese; Chinook; Dogo Argentino; Finnish Spitz; Glen of Imaal Terrier; Hovawart; Kai Ken; Lagotto Romagnolo; Mudi; Norwegian Lundehund; Otterhound; Peruvian Inca Orchid; Portuguese Podengo Pequeno; Russian Toy; Schipperke; Thai Ridgeback; Utonagan; Volpino Italiano; Wirehaired Vizsla; Xoloitzcuintli; Yakutian Laika; Zuchon.</source>
          <target></target>
         </trans-unit>
         <trans-unit id="4">
          <source>Great white shark; tiger shark; bull shark; hammerhead shark; blue shark; mako shark; nurse shark; lemon shark; blacktip shark; Caribbean reef shark; zebra shark; whale shark; basking shark; thresher shark; grey reef shark; sand tiger shark; spiny dogfish; porbeagle shark; white-tip reef shark; wobbegong; goblin shark; cookie-cutter shark; silky shark; Galapagos shark; angel shark.</source>
          <target></target>
         </trans-unit>
        </body>
      </file>
    </xliff>

    I used this rule:

    Screenshot showing a semicolon break rule in the segmentation settings for the TM

    This results in this:

    Screenshot showing the result of a segmentation rule to segment on the semi-colon

    So the theory seems to work.  Your rule is different to mine... but perhaps your source file is formatted differently.  So a few questions:

    1. please provide a small example of how your file is formatted
    2. if you provide a complete file we will be able to see if it's XLIFF 1.2 or 2.0, but if you don't then which XLIFF is it?
    3. did you make the changes to your TM before you opened the file in a new project or after the project was already created?
    4. are you sure you are opening the file against the TM you create the rules for?

    Also, for interest... you may also be able to solve this even more easily by using the embedded content processor.  So I created a placeholder rule like this and set it to "Exclude" in the advanced settings:

    Screenshot showing a placeholder rule using the semi-colon in the XLIFF filetype settings

    This results in an even cleaner and easier approach for my sample file:

    Screenshot showing the results of the placeholder rule in the filetype preview feature.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Children
  • Thanks; I will attach a file so you have a sample of what I was sent to work with. It's XLIFF 1.2.

    Yes, I made the changes to the TM before creating the project and I ensured I used the correct TM when creating my project.

    I've just gone and tried your suggestion in regards to the embedded content processor.

    I notice that you have a couple of extra options in your screen that I don't have; you have some radio buttons here and I don't...not sure if that causes any issues:

    Trados Studio project settings window showing 'Embedded content' option enabled with a red underline indicating a change or selection.

    I made sure I set it to 'Exclude' in advanced settings and created a new project (using the same sample file I will send you) and the values weren't separated but the colons are now highlighted in purple (they weren't previously):

    Trados Studio translation segment with colons highlighted in purple, indicating special handling or an error in the translation memory.

    I'll take another look at my segmentation rules.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:38 AM (GMT 0) on 29 Feb 2024]


  • I'm not sure of the best way to attach the file?  In case the google drive link doesn't work here is a screenshot of what I'm working with:

    Screenshot of an XML file opened in Notepad with Trados Studio tags and instructions for translation. No visible errors or warnings.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:38 AM (GMT 0) on 29 Feb 2024]
  •  

    Just use the insert menu and copy paste the code in there:

    Screenshot showing how to use the insert menu to add code.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    I made sure I set it to 'Exclude' in advanced settings and created a new project (using the same sample file I will send you) and the values weren't separated but the colons are now highlighted in purple (they weren't previously)

    This definitely suggests you have not set them to exclude as they have been included.

    I notice that you have a couple of extra options in your screen that I don't have; you have some radio buttons here and I don't...not sure if that causes any issues:

    Just means you have an old version of Studio.  What version are you using?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Unless I'm setting the wrong thing to 'Exclude' this is what I have:

    Trados Studio project settings window showing 'Embedded content' section with a dialog box for 'AddEdit Embedded Content Rule'. The 'Segmentation hint' is set to 'Exclude'.

    This is the version:

    About SDL Trados Studio dialog box displaying the version as 'SDL Trados Studio 2019 SR2 - 15.2.0.1041' with copyright and license information.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:38 AM (GMT 0) on 29 Feb 2024]
  • <?xml version="1.0" encoding="UTF-8"?>
    <xliff version="1.2">
        <file source-language="en-US" target-language="" datatype="winres" original="Master.exe" xml:space="preserve">
            <body>
                <trans-unit id="CustomLabel.MYT_Access_Learning_Content" maxwidth="1000" size-unit="char">
                    <source>Access learning content</source>
                    <seg-source>
                        <mrk mtype="seg" mid="1">Access learning content</mrk>
                    </seg-source>
                    <target></target>
                    <note>Values are seperated by ";" please don't remove ; while translating. Please don't remove html tag and attribute in curly braces ({data}) while translating</note>
                </trans-unit>
                <trans-unit id="CustomLabel.MYT_AccountContactRoles" maxwidth="1000" size-unit="char">
                    <source>Secondary Account Owner;Company Admin;Product User;Team Manager;Account Owner</source>
                    <seg-source>
                        <mrk mtype="seg" mid="1">Secondary Account Owner;Company Admin;Product User;Team Manager;Account Owner</mrk>
                    </seg-source>
                    <target></target>
                    <note>Values are seperated by ";" please don't remove ; while translating. Please don't remove html tag and attribute in curly braces ({data}) while translating</note>
                </trans-unit>
                <trans-unit id="CustomLabel.MYT_AccountOwner_Labels" maxwidth="1000" size-unit="char">
                    <source>Account Owner;The Account Owner is the primary admin of the account. They have access to all permissions.;An account owner transfer request has been sent to;Once they accept the role, you will be converted to a Secondary Account Owner.;Choose a user to replace you as the Primary Account Owner. Once the transfer is complete, you will be converted to a Secondary Account Owner.;Transfer to;Transfer Initiated;Cancel Transfer;Transfer account ownership;you;Already, {name} {email} is a Primary Account Owner;A Primary Account Owner cannot be added as a Secondary Account Owner;Refresh</source>
                    <seg-source>
                        <mrk mtype="seg" mid="1">Account Owner;The Account Owner is the primary admin of the account. They have access to all permissions.;An account owner transfer request has been sent to;Once they accept the role, you will be converted to a Secondary Account Owner.;Choose a user to replace you as the Primary Account Owner. Once the transfer is complete, you will be converted to Aa Secondary Account Owner.;Transfer to;Transfer Initiated;Cancel Transfer;Transfer account ownership;you;Already, {name} {email} is a Primary Account Owner;A Primary Account Owner cannot be added as a Secondary Account Owner;Refresh</mrk>
                    </seg-source>
                    <target></target>
                    <note>Values are seperated by ";" please don't remove ; while translating. Please don't remove html tag and attribute in curly braces ({data}) while translating</note>
                </trans-unit>
            </body>
        </file>
    </xliff>

    emoji
  •  

    Thanks for sharing the file.  The problem you have is that your client has actually specified the segmentation for this file which means the rules are in the file and with the XLIFF filetype there is no way round it.  If the file didn't have these it would be fine:

                    <seg-source>
                        <mrk mtype="seg" mid="1">Secondary Account Owner;Company Admin;Product User;Team Manager;Account Owner</mrk>
                    </seg-source>

    For example, this works perfectly:

    <?xml version="1.0" encoding="UTF-8"?>
    <xliff version="1.2">
        <file source-language="en-US" target-language="" datatype="winres" original="Master.exe" xml:space="preserve">
            <body>
                <trans-unit id="CustomLabel.MYT_Access_Learning_Content" maxwidth="1000" size-unit="char">
                    <source>Access learning content</source>
                    <target></target>
                    <note>Values are seperated by ";" please don't remove ; while translating. Please don't remove html tag and attribute in curly braces ({data}) while translating</note>
                </trans-unit>
                <trans-unit id="CustomLabel.MYT_AccountContactRoles" maxwidth="1000" size-unit="char">
                    <source>Secondary Account Owner;Company Admin;Product User;Team Manager;Account Owner</source>
                    <target></target>
                    <note>Values are seperated by ";" please don't remove ; while translating. Please don't remove html tag and attribute in curly braces ({data}) while translating</note>
                </trans-unit>
                <trans-unit id="CustomLabel.MYT_AccountOwner_Labels" maxwidth="1000" size-unit="char">
                    <source>Account Owner;The Account Owner is the primary admin of the account. They have access to all permissions.;An account owner transfer request has been sent to;Once they accept the role, you will be converted to a Secondary Account Owner.;Choose a user to replace you as the Primary Account Owner. Once the transfer is complete, you will be converted to a Secondary Account Owner.;Transfer to;Transfer Initiated;Cancel Transfer;Transfer account ownership;you;Already, {name} {email} is a Primary Account Owner;A Primary Account Owner cannot be added as a Secondary Account Owner;Refresh</source>
                    <target></target>
                    <note>Values are seperated by ";" please don't remove ; while translating. Please don't remove html tag and attribute in curly braces ({data}) while translating</note>
                </trans-unit>
            </body>
        </file>
    </xliff>

    Given there no target values, and you are using 2019 rather than 2022, I can suggest a workaround like this:

    1. open the XLIFF in Studio and copy source to target.
    2. Save the target file
    3. create a new XML filetype for the XLIFF that only extracts the target element
    4. use a similar rule on the semi-colon
    5. open the XLIFF from 2. with the new XML filetype
    6. translate and save target

    This time you should be able to translate the file, properly segmented by avoiding the XLIFF filetype and using the XML filetype instead.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Great, thanks Paul.  I'll look into this some more and get back to you if I have any questions.  Thanks!

    emoji