Segmentation failure after colon

Hi,

I use the default segmentation rule for colon.

Segmentation after colon usually works fine but in some cases the text is not segmented after colon. E.g.:

Screenshot of Trados Studio showing text not segmented after a colon, with a warning symbol next to the unsegmented text.

Is there any solution for this problem?

Sandor



Generated Image Alt-Text
[edited by: Trados AI at 6:25 PM (GMT 0) on 28 Feb 2024]
emoji
Parents Reply Children
  • Former Member
    0 Former Member in reply to Former Member

    so.. Workaround should be pretty straightforward and easy.

    make them (I select "colon and a space") the same font - whatever it is.

    all of them at the same time.

    if you want root cure, sorry that is not my job here.

    Screenshot of Trados Studio settings window with a circled section highlighting the font selection dropdown set to Arial Unicode MS.

    Screenshot of Trados Studio interface showing a table with multiple rows, each containing the text 'Make commas and colon the same font' without any visible errors.

    Screenshot of Trados Studio interface with a warning message at the bottom stating 'Some content did not fit in the allowed space and was cut off.'

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 6:26 PM (GMT 0) on 28 Feb 2024]
  • Nice spot!
    Displaying the non-printable characters and zooming in to the max it shows that the spaces are not the same - one placeholder is square and one is round... plus, when you select each space, the highlight shows that they have a different width:

    Screenshot showing Trados Studio text comparison with a red arrow pointing to a space discrepancy between the words 'rules' and a profanity, and 'rules' and 'good'.

    Still, the internal XML contains just normal spaces (\u0020), so it must be the font representation of the space in the OpenXML or something what makes it a different space than the one understood as "whitespace character" by the \s regexp metacharacter in the segmentation rule.

    So, after all I was actually right that the space is not just an ordinary space and that's the reason why the segmentation rule fails.

    Perhaps it's time for SDL to review and update the default segmentation rules definitions...

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 6:26 PM (GMT 0) on 28 Feb 2024]