Soft break segmentation rule almost working…

Hi all!

I set up a soft break segmentation rule in the main and only TM associated to a project, as per instructions found here (community.sdl.com/.../that-manual-line-break-soft-return-segmentation-rule) and here (noradiaz.blogspot.com/.../adding-soft-return-segmentation-rule-to.html). (See pictures 1 and 2.)

 Trados Studio Translation Memory Settings window showing segmentation rules with 'Other terminating punctuation' selected. Trados Studio Edit Segmentation Rule window for 'Soft break' with a regular expression entered in the 'Before break' field.

It catches 98% of my soft breaks, but not 100%, as you’d expect. (See picture 3.) What am I missing here?

Trados Studio segment view with segment numbers 366, 367, and 369, showing a soft break not caught by the rule in segment 367.



Generated Image Alt-Text
[edited by: Trados AI at 3:27 AM (GMT 0) on 29 Feb 2024]
emoji
Parents Reply
  • Nifty little setting I wasn’t aware of! Thanks!

    It does take care of the specific segments I provided, and my source text is now completely free of any soft breaks. However, the original file also has a bunch of sentences that have formatting tags, so they now end up broken into multiple segments (segments 12 to 16 in picture below; new test file also uploaded), whereas I would like those to stay together.

    Screenshot of Trados Studio showing lines 5 to 16 with text segments separated by soft breaks. 

    In the best possible scenario, I would want to get what’s below right away without having to resort to merging segments.

    Screenshot of Trados Studio with formatting tags visible in lines 12 to 15, such as bold and italic tags.

    By using the “May Exclude” setting instead of “Exclude” under “Segmentation hint” and keeping my soft break segmentation rule in the TM, I get what’s below, which, in this particular case (file containing 28,000 words in 2,900 segments), appears to be what would require the least amount of manual handling (all together, I’m down to 11 segments with soft breaks instead of my initial 87 segments).

    Screenshot of Trados Studio displaying broken segments due to formatting tags in lines 5, 6, 10, and 11.

    Would there be a way to get what’s in the second picture in one go? I was thinking there might be a way to force Studio to exclude only certain tags (e.g. <li>, <ul>, <p>) while always including others (e.g. <b>, <i>). Do you think there is potential there?

    TestFile2.xlsx

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 3:28 AM (GMT 0) on 29 Feb 2024]
Children