Unlock the Mystery: Help Needed with Segmentation for Enumeration Markers

Dear all,

I need your help with creating a segmentation rule.

I'm using Trados Studio 2024 - 18.0.2.3255 and following this video https://youtu.be/mo7jgjTX30s?feature=shared to create a translation memory based on a bilingual Excel file that appears as follows:
Screenshot of a bilingual Excel file in Trados Studio with English and German text, highlighting the lack of semicolon before enumeration markers in German.

As displayed above, in the German text, unlike in English, there is no semicolon before the enumeration markers a), b), etc., which prevents the text from segmenting as expected:

Close-up of the same bilingual Excel file in Trados Studio showing English text with proper segmentation of enumeration markers.      Close-up of the same bilingual Excel file in Trados Studio showing German text with improper segmentation of enumeration markers.

I attempted to create a few segmentation rules, such as using a space as a break character (as follows, as there is no consistent separator before the enumeration markers (a), (b), etc). I even tried to use \| [a-zA-Z]\) \|as break characters though I'm unsure if it's possible to set this as a break character.
Screenshot of the 'Add Segmentation Rule' dialog in Trados Studio with an attempted regular expression entered to define break characters.
I feel like I'm reaching the limits of my knowledge on segmentation rules in Trados.

Could anyone please assist with this? The file is large, and manually separating the segments would be very time-consuming.

Thanks a lot in advance for any ideas!

Chengle



Generated Image Alt-Text
[edited by: RWS Community AI at 4:14 PM (GMT 1) on 9 Apr 2025]
emoji
  •  

    Tricky with bilingual files for two reasons:

    1. the out of the box bilingual Excel will not segment at all within a cell when there is content in both the source and target already (this is deliberate)
    2. the Multilingual Excel filetype should be able to handle it, although it will only segment the source, and you would have to manually address the target because the entire cell will be placed into the first target segment of the segmented source (also deliberate)

    The deliberate reasons are that Studio only segments on the source anyway so doesn't know what to do with the target.  The bilingual Excel filetype says... in this case I'll never even try to segment where possible if the target is populated... the Multilingual Excel does away with that restriction but still doesn't know what to do with the target.so segments the source and not the target.

    There is a third problem in case you think the Multilingual Excel is the way to go... and that is its buggy and throws an error when you try!  FYI.

    So, I reckon the best approach might be to use the monolingual Excel filetype and align two Excel files.

    You could create these rules on the Excel filetype for example and make sure they are all set to Exclude:

    \s\|\s

    \(\w+\)

    (?<=\s)\w\)(?=\s)

    \d+\.(?=\s)

    Like this:

    Screenshot of Trados Studio options menu showing Embedded Content settings with arrows pointing to Document Structure Information and Tag Definition Rules.

    Then make sure you separate the bilingual file into two monolingual files and the alignment gets you this:

    Screenshot of a bilingual Excel file in Trados Studio with English text on the left and German translation on the right, indicating segmentation of text.

    I started with the pipes as this seems a better place to segment than the semicolons.  That way I can then remove the (a), the a) and even the 4. ... for example... leaving behind only the phrases and sentences to be aligned for your TM.

    Obviously this is untested on a full file where I don't know what else is in there, but perhaps it's an approach worth trying if you want a TM from these files?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 9:16 PM (GMT 1) on 9 Apr 2025]
  • Hello ,

    Thank you so much for your quick response. I truly appreciate the detailed explanation and insights you provided about the different approaches. I'll definitely try aligning the Excel files.

    One more question: Could you please guide me on how to create the segmentation rule with a pipe? I attempted something, but it didn't seem to work. Here's what I have so far:
    Screenshot of Trados Studio segmentation rules interface with a regular expression entered in the 'Before break' field and an empty 'After break' field.

    Thank you once again for your help!

    Best regards,

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 11:27 AM (GMT 1) on 10 Apr 2025]
  •  

    It won't help you with a bilingual Excel file for the reasons I already explained.  Try copying your text into a monolingual Excel, or a text file, and then test your segmentation rule.  I think you'll have more success that way.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi ,

    Thank you for the suggestion regarding aligning the monolingual files. I gave it a try, but unfortunately, I did not get the same results as shown in your screenshots. Specifically, for example "| (a) |" can still be seen:

    Screenshot of a bilingual text comparison in Trados Studio showing misaligned segments with the pipe symbol ' (a) ' visible in the source column but not aligned with the target column.

    I suspect this might be because the segmentation does not start with the pipe symbol. As you mentioned in your previous comment, you "started with the pipes"—could you please clarify how I should configure the segmentation rule in the translation memory to achieve the same results you described?

    This is important because the translation memory will be used to translate the points a, b, and c separately.

    For your reference, I’ve also attached the monolingual files in case that helps to provide more insights.

    Thanks a lot for looking into this!

    Chengle

    EN_1.xlsx

    DE_1.xlsx

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 8:46 PM (GMT 1) on 14 Apr 2025]
  •  

    As I explained, and showed, I didn't use segmentation rules at all.  I used filetype rules... it's easier!

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hello  ,

    I did use the file type rules as you suggested and aligned two monolingual files.
    Screenshot of Trados Studio options menu with 'Embedded content' selected, showing tag definition rules for file type settings.

    However, the results are different from what you achieved (points a, b, and c are not treated as separate segments).

    Your result:

    Screenshot of Trados Studio showing alignment of English and German monolingual files with segments not matching expected results.

    My current result:
    Screenshot of Trados Studio alignment comparison with English and German text, highlighting discrepancies in segment division.

    Could you please let me know what could be wrong with my filetype rules settings?

    Thank you,

    Chengle

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 6:35 AM (GMT 1) on 15 Apr 2025]
  •   

    Maybe you didn't set them to exclude?  Select the rule, edit, then click on Advanced and set the segmentation rule to Exclude.  If you don't do that they will most likely be inline... which is what you're showing.

    I suggest you test them in the filetype settings before you go to the lengths of aligning.  If they don't segment in the filetype settings they won't segment anywhere else :-)

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi ,

    “Excluding” the rules was actually the problem. Missing one bit of knowledge can lead to getting everything wrong. Slight smile

    Thanks a lot for your patience and great help. You really saved my day(s)!

    Best regards,

    Chengle

    emoji