How to avoid segmentation after notes superscript numbers

Some colleague translators are having segmentation problems when they work with note superscript numbers at the end of a sentence. They end up with several sentences together in the same segment that should had been segmented after the superscript number. Some translators end up fragmenting all those segments manually in the Editor, while others modify the original document before creating their project in Studio and move the punctuation after the note superscript number to avoid this kind of segmentation problem. Both methods are very time-consuming. How can the segmentation rules be modified to deal with this problem? Is there any other way to solve it? Thanks in advance.

Example:

During the first 10 months after the pandemic was officially declared, alcohol and cannabis use5-6 [no problem here] and depression7 [no problem here] increased, and self-rated positive mental health, life satisfaction and community belonging8 [no problem here] declined, with no changes in suicidal ideation noted.9 [should have been segmented] However, Canadians were not equally impacted. As Varin et al.5 [no problem here; should not be segmented] maintain, “understanding the social determinants of health is key to developing harm reduction and mitigation strategies.”

Screenshot of Trados Studio showing incorrect segmentation after the superscript number 9, with a note 'Should be segmented'.

Screenshot of Trados Studio displaying incorrect segmentation after the superscript number 15, with a note 'Should be segmented'.



Generated Image Alt-Text
[edited by: Trados AI at 3:48 AM (GMT 0) on 29 Feb 2024]
emoji
Parents
  • IMHO, manually splitting segments is very bad practice. It's time consuming and comes back to bite you if you have to update the file or work with a very similar file etc. I would avoid that at all cost.

    Studio is pretty user-friendly when it comes to editing segmentation rules, but unfortunately it does not have a preview function (Hey, RWS, that would be cool.) There is a free tool called Ratel which does:

    Screenshot of Trados Studio showing an error in segmentation rules with red underlined text indicating incorrect regular expression syntax.

    So you define an exception to the rule(s). If a number \d+ or two numbers seperated by a dash \d+-\d+ occur after a full stop, break.

    In Studio, segmentation rules are stored in TMs, the first TM of your project is used for segmentation. So if you go into settings there, you can customize segmentation rules:

    Screenshot of Trados Studio's Segmentation Rules window with options to add, edit, or remove rules for sentence-based segmentation.

    Add another rule and select "advanced view", then you get their default "new rule":

    Screenshot of Trados Studio's 'Add Segmentation Rule' window with default rule settings in the description and after break fields.

    Now add your footnote definition to the "before the break":

    Screenshot of Trados Studio's 'Add Segmentation Rule' window with a red circle highlighting the regular expression for numbers separated by a dash in the 'before break' field.

    I don't have the time to test this, but I am pretty sure this should work.

    Daniel

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 3:48 AM (GMT 0) on 29 Feb 2024]
  • Thank you very much Daniel. I am going to test your segmentation rule. Unfortunately, we work in an environment that does not allow to install any tools and most of our translators will not know how to create or edit the default segmentation rules. Thanks again.  

Reply Children