Segmentation failure after colon

Hi,

I use the default segmentation rule for colon.

Segmentation after colon usually works fine but in some cases the text is not segmented after colon. E.g.:

Screenshot of Trados Studio showing text not segmented after a colon, with a warning symbol next to the unsegmented text.

Is there any solution for this problem?

Sandor



Generated Image Alt-Text
[edited by: Trados AI at 6:25 PM (GMT 0) on 28 Feb 2024]
emoji
Parents Reply
  • Nice spot!
    Displaying the non-printable characters and zooming in to the max it shows that the spaces are not the same - one placeholder is square and one is round... plus, when you select each space, the highlight shows that they have a different width:

    Screenshot showing Trados Studio text comparison with a red arrow pointing to a space discrepancy between the words 'rules' and a profanity, and 'rules' and 'good'.

    Still, the internal XML contains just normal spaces (\u0020), so it must be the font representation of the space in the OpenXML or something what makes it a different space than the one understood as "whitespace character" by the \s regexp metacharacter in the segmentation rule.

    So, after all I was actually right that the space is not just an ordinary space and that's the reason why the segmentation rule fails.

    Perhaps it's time for SDL to review and update the default segmentation rules definitions...

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 6:26 PM (GMT 0) on 28 Feb 2024]
Children
No Data