Segmentation after index (superscript) numbers

Hello,

I have a long text with a lot of sentences followed by an index number (the document is converted but it is impossible to convert the superscript numbers to real automatic index numbers).

Is there a way to create a segmentation rule for the pattern "dot followed by a number in superscript"?

emoji
Parents
  • Hello,

    To create a segmentation rule for the pattern "dot followed by a number in superscript," you can follow these steps:

    Step 1: Open Trados Studio and go to your project settings.

    Step 2: Navigate to the "Language Pairs" section and select the specific language pair you are working with.

    Step 3: Click on "Segmentation" in the settings list.

    Step 4: Click on the "Edit Rules" button to open the segmentation rules editor.

    Step 5: Click on the "Add" button to create a new rule.

    Step 6: In the "Name" field, give your rule a descriptive name, such as "Superscript Numbers."

    Step 7: In the "Before break" field, enter the following regular expression: `\.\d`

    This expression will match a dot followed by a digit.

    Step 8: In the "After break" field, you can leave it empty or add a specific pattern if needed.

    Step 9: Make sure the "Regular expressions" checkbox is enabled.

    Step 10: Click "OK" to save your new rule.

    Step 11: Click "OK" again to close the segmentation rules editor and apply the changes to your project.

    Now, Trados Studio should segment your text based on the new rule you've created, which targets the pattern "dot followed by a number in superscript." Keep in mind that this rule may not be perfect, and you might need to adjust the regular expression to better suit your specific text.

    I hope this helps! If you have any further questions, please don't hesitate to ask.

    Best regards,

    RWS Community AI

    emoji
Reply
  • Hello,

    To create a segmentation rule for the pattern "dot followed by a number in superscript," you can follow these steps:

    Step 1: Open Trados Studio and go to your project settings.

    Step 2: Navigate to the "Language Pairs" section and select the specific language pair you are working with.

    Step 3: Click on "Segmentation" in the settings list.

    Step 4: Click on the "Edit Rules" button to open the segmentation rules editor.

    Step 5: Click on the "Add" button to create a new rule.

    Step 6: In the "Name" field, give your rule a descriptive name, such as "Superscript Numbers."

    Step 7: In the "Before break" field, enter the following regular expression: `\.\d`

    This expression will match a dot followed by a digit.

    Step 8: In the "After break" field, you can leave it empty or add a specific pattern if needed.

    Step 9: Make sure the "Regular expressions" checkbox is enabled.

    Step 10: Click "OK" to save your new rule.

    Step 11: Click "OK" again to close the segmentation rules editor and apply the changes to your project.

    Now, Trados Studio should segment your text based on the new rule you've created, which targets the pattern "dot followed by a number in superscript." Keep in mind that this rule may not be perfect, and you might need to adjust the regular expression to better suit your specific text.

    I hope this helps! If you have any further questions, please don't hesitate to ask.

    Best regards,

    RWS Community AI

    emoji
Children