Do I need to create a new segmentation rule or an exception to an existing rule? Along with another regex problem as well !

Hello,

I encountered another segmentation problem along with another regex issue when aligning a rather long document.

Here's a capture of my segmentation problem:

Screenshot of Trados Studio showing a segmentation issue with target segments 36 and 37 not aligning with source segment 34.

Ideally I would like target segments 36 and 37 to be a single segment and correspond to source segment 34.

To my understanding I need to create an exception to whatever segmentation break rule Studio is applying here.

What is not clear to me is: What segmentation break rule is Studio exactly using here?

Here's a capture of the original target text:

Close-up of text in Trados Studio highlighting the segmentation break at the end of a sentence with a closing quotation mark followed by an opening bracket.

Is Studio doing a "closing-quotation mark segmentation break"or an "opening-bracket-segmentation break"?

Or am I asking myself the wrong question?

When I open the dialog box for segmentation rule, I see rules for the following break characters:

- period,

- question mark,

- exclamation mark,

- colon,

- semi-colon, and

- tab

See the foolowing screen capture: Trados Studio Edit Segmentation Rule dialog box showing a list of break characters including period, question mark, exclamation mark, colon, semi-colon, and tab.

It seems to me that the segmentation break is performed either after a closing quotation mark, before or after a regular space

or before an opening bracket.

None of those 3 characters are part of the list of available character breaks.

Essentially, I transcribed the string surrounding the segmentation break "players." (LSSS" as follows:  \w+\."\s\(w+

and fiddled with that regex in multiple ways in the "Before break" field and in the "After break" field... all with resounding failure!

I have dozens of those breaks in that long text... and I have more than 10 of those texts with similar breaks to align!

Can somebody clarify for me how Studio "is thinking" to perform the segmentation break here in the case presented...

and also provide a solution to my problem.

Thanks in advance.



Generated Image Alt-Text
[edited by: Trados AI at 3:42 PM (GMT 0) on 28 Feb 2024]
emoji
Parents
  • Hello  

    Perhaps you are just looking at this the wrong way.  You have two sentences, something like this:

    FR
    À cette fin, un peu plus de texte ici, avec les autres intervenants du milieu » (LSSSS, art. 100).

    EN
    To that end, some more text here, with other key players." (LSSSS, sec. 100).

    The problem you have is that the FR source doesn't have a period before the closing quote so this is why it does not segment.  The EN target however does.  So if you add an exception to the fullstop rules in the EN language resources of your TM then you can prevent it from segmenting.  Something like this perhaps:

    Trados Studio Edit Segmentation Rule dialog box showing an exception added for full stop rule when followed by a quote.

    Then the alignment looks like this:

    Trados Studio alignment view showing French to English translation with mismatched segmentation due to punctuation difference.

    Which is what you were trying to achieve... I believe.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 3:42 PM (GMT 0) on 28 Feb 2024]
  • Hello Paul,

    Thanks once again for your reply... but it doesn't work for me.
    I scratched my head a couple of times, because I'm just so sure that when you give an answer

    it just is the right answer.

    So, I just started poking around and I found the problem... which I can't solve.

    Here it is:

    I created the exception to the segmentation rule in L2, exactly as you instructed,, opened the two files for alignment and Studio

    still kept breaking the segment like before.

    I went back to the "Edit rule Exception" dialog box in BASIC view and it looked fine.

    Screenshot of Trados Studio's 'Edit Rule Exception' dialog box in BASIC view showing an incorrect regular expression 'w+..+ ' in the 'Before break' field.

    I then clicked on the ADVANCED view button, and this is where there is a problem:

    Screenshot of Trados Studio's 'Edit Rule Exception' dialog box in ADVANCED view with an erroneous regular expression 'w+..+ ' auto-generated by the software.

    For some reason, Studio created a different regex on its own.

    I corrected it and entered the right expression and clicked OK a couple of times to return to the alignment view.

    I then proceeded to to the alignment again and sure enough, it was still incorrect.

    I went back to the "Edit rule Exception" dialog box in BASIC view and it looked fine again. I toggled again to the ADVANCED view

    and there was again the same erroneous regular expression that Studio created on its own:  " \w+\.\.+ "

    I did this four or five times, and it was the same problem again and again.

    I tried different scenarios where I would have the incorrectly aligned files open/active, then unopened.

    The same for the TM concerned, I tried having it opened, then unopened during the procedure.

    My last troubleshooting idea was to go back to the "Edit rule Exception" dialog box in BASIC view and enter XXXXX

    in the "Before break" and check the "regular expression" box and the same for the "After break".

    I then toggled to the advanced view and here's what Studio did:

    Screenshot of Trados Studio's 'Edit Rule Exception' dialog box in BASIC view where the user has entered 'XXXXX' in both 'Before break' and 'After break' fields with 'regular expression' checked.

    So, this is as far as I can go on my own. I just can't solve it.

    As usual, Paul, any help would be much appreciated,

    Christian

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 3:43 PM (GMT 0) on 28 Feb 2024]
Reply
  • Hello Paul,

    Thanks once again for your reply... but it doesn't work for me.
    I scratched my head a couple of times, because I'm just so sure that when you give an answer

    it just is the right answer.

    So, I just started poking around and I found the problem... which I can't solve.

    Here it is:

    I created the exception to the segmentation rule in L2, exactly as you instructed,, opened the two files for alignment and Studio

    still kept breaking the segment like before.

    I went back to the "Edit rule Exception" dialog box in BASIC view and it looked fine.

    Screenshot of Trados Studio's 'Edit Rule Exception' dialog box in BASIC view showing an incorrect regular expression 'w+..+ ' in the 'Before break' field.

    I then clicked on the ADVANCED view button, and this is where there is a problem:

    Screenshot of Trados Studio's 'Edit Rule Exception' dialog box in ADVANCED view with an erroneous regular expression 'w+..+ ' auto-generated by the software.

    For some reason, Studio created a different regex on its own.

    I corrected it and entered the right expression and clicked OK a couple of times to return to the alignment view.

    I then proceeded to to the alignment again and sure enough, it was still incorrect.

    I went back to the "Edit rule Exception" dialog box in BASIC view and it looked fine again. I toggled again to the ADVANCED view

    and there was again the same erroneous regular expression that Studio created on its own:  " \w+\.\.+ "

    I did this four or five times, and it was the same problem again and again.

    I tried different scenarios where I would have the incorrectly aligned files open/active, then unopened.

    The same for the TM concerned, I tried having it opened, then unopened during the procedure.

    My last troubleshooting idea was to go back to the "Edit rule Exception" dialog box in BASIC view and enter XXXXX

    in the "Before break" and check the "regular expression" box and the same for the "After break".

    I then toggled to the advanced view and here's what Studio did:

    Screenshot of Trados Studio's 'Edit Rule Exception' dialog box in BASIC view where the user has entered 'XXXXX' in both 'Before break' and 'After break' fields with 'regular expression' checked.

    So, this is as far as I can go on my own. I just can't solve it.

    As usual, Paul, any help would be much appreciated,

    Christian

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 3:43 PM (GMT 0) on 28 Feb 2024]
Children