Using semi colons as a delimiter for xlf files

Hi all -

I recently recevied an .xlf source file that needed translating.

There are many semi colons used within the file to separate values but when importing into Trados, the semi colon isn't being used as a delimiter to separate the text out into different segments, and I want it to be.

(This of course would make it easier for our translators to read, and means that we can update our translation memories with shorter strings that make sense. Once the file has been translated, the target translation would then need to put things back in the same format in which we received the file.)

I tried creating a rule in Trados (2019) but didn't have any luck. Has anyone created a rule like this before and had it work? I had opened the Translation Memory that I was going to use, and went to Language Resources > Segmentation Rules > Edit. I added a new rule based on an existing one but it still didn't work and I'm wondering if there is something obvious I'm missing.

Thanks!

emoji
Parents
  • I added a new rule with the following values: 

    Trados Studio Edit Segmentation Rule dialog box showing 'Semi colon rule' with 'Before break' set to 'Anything', 'Break characters' set to a colon, and 'After break' set to 'Whitespace (including spaces)'. 'Include closing punctuation' is checked.

    Advanced view default values as follows: 

    Trados Studio Edit Segmentation Rule dialog box in Advanced View with a regular expression entered for 'Before break' and 'After break' set to a whitespace character. The description 'Semi colon rule' is visible.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:38 AM (GMT 0) on 29 Feb 2024]
  •  

    ok - so I created an XLIFF 1.2 sample like this:

    <?xml version="1.0" encoding="utf-8"?>
    <xliff version="1.2" >
      <file original='somefile.txt' source-language="en-GB" target-language="de-DE" datatype='plaintext' >
        <header>
          <note>simple xliff</note>
        </header>
        <body>
         <trans-unit id="1">
          <source>Apple; Banana; Cherry; Date; Elderberry; Fig; Grapefruit; Honeydew; Indian plum; Jackfruit; Kiwi; Lemon; Mango; Nectarine; Orange; Papaya; Quince; Raspberry; Strawberry; Tangerine; Ugli fruit; Valencia orange; Watermelon; Xigua; Yellow passionfruit; Zucchini (also a fruit, botanically).</source>
          <target></target>
         </trans-unit>
         <trans-unit id="2">
          <source>1967 Shelby GT500; 1957 Chevrolet Bel Air; 1961 Jaguar E-Type; 1963 Aston Martin DB5; 1957 Mercedes 300SL Gullwing; 1964 Porsche 911; 1966 Ferrari 275 GTB; 1969 Dodge Charger; 1955 Ford Thunderbird; 1963 Chevrolet Corvette Sting Ray; 1968 Lamborghini Miura; 1965 Ford Mustang; 1973 BMW 3.0 CSL; 1958 Cadillac Eldorado Biarritz; 1967 Alfa Romeo Spider Duetto.</source>
          <target></target>
         </trans-unit>
         <trans-unit id="3">
          <source>Affenpinscher; Azawakh; Bolognese; Chinook; Dogo Argentino; Finnish Spitz; Glen of Imaal Terrier; Hovawart; Kai Ken; Lagotto Romagnolo; Mudi; Norwegian Lundehund; Otterhound; Peruvian Inca Orchid; Portuguese Podengo Pequeno; Russian Toy; Schipperke; Thai Ridgeback; Utonagan; Volpino Italiano; Wirehaired Vizsla; Xoloitzcuintli; Yakutian Laika; Zuchon.</source>
          <target></target>
         </trans-unit>
         <trans-unit id="4">
          <source>Great white shark; tiger shark; bull shark; hammerhead shark; blue shark; mako shark; nurse shark; lemon shark; blacktip shark; Caribbean reef shark; zebra shark; whale shark; basking shark; thresher shark; grey reef shark; sand tiger shark; spiny dogfish; porbeagle shark; white-tip reef shark; wobbegong; goblin shark; cookie-cutter shark; silky shark; Galapagos shark; angel shark.</source>
          <target></target>
         </trans-unit>
        </body>
      </file>
    </xliff>

    I used this rule:

    Screenshot showing a semicolon break rule in the segmentation settings for the TM

    This results in this:

    Screenshot showing the result of a segmentation rule to segment on the semi-colon

    So the theory seems to work.  Your rule is different to mine... but perhaps your source file is formatted differently.  So a few questions:

    1. please provide a small example of how your file is formatted
    2. if you provide a complete file we will be able to see if it's XLIFF 1.2 or 2.0, but if you don't then which XLIFF is it?
    3. did you make the changes to your TM before you opened the file in a new project or after the project was already created?
    4. are you sure you are opening the file against the TM you create the rules for?

    Also, for interest... you may also be able to solve this even more easily by using the embedded content processor.  So I created a placeholder rule like this and set it to "Exclude" in the advanced settings:

    Screenshot showing a placeholder rule using the semi-colon in the XLIFF filetype settings

    This results in an even cleaner and easier approach for my sample file:

    Screenshot showing the results of the placeholder rule in the filetype preview feature.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Thanks; I will attach a file so you have a sample of what I was sent to work with. It's XLIFF 1.2.

    Yes, I made the changes to the TM before creating the project and I ensured I used the correct TM when creating my project.

    I've just gone and tried your suggestion in regards to the embedded content processor.

    I notice that you have a couple of extra options in your screen that I don't have; you have some radio buttons here and I don't...not sure if that causes any issues:

    Trados Studio project settings window showing 'Embedded content' option enabled with a red underline indicating a change or selection.

    I made sure I set it to 'Exclude' in advanced settings and created a new project (using the same sample file I will send you) and the values weren't separated but the colons are now highlighted in purple (they weren't previously):

    Trados Studio translation segment with colons highlighted in purple, indicating special handling or an error in the translation memory.

    I'll take another look at my segmentation rules.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:38 AM (GMT 0) on 29 Feb 2024]


  • I'm not sure of the best way to attach the file?  In case the google drive link doesn't work here is a screenshot of what I'm working with:

    Screenshot of an XML file opened in Notepad with Trados Studio tags and instructions for translation. No visible errors or warnings.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:38 AM (GMT 0) on 29 Feb 2024]
Reply Children