Do I need to create a new segmentation rule or an exception to an existing rule? Along with another regex problem as well !

Hello,

I encountered another segmentation problem along with another regex issue when aligning a rather long document.

Here's a capture of my segmentation problem:

Screenshot of Trados Studio showing a segmentation issue with target segments 36 and 37 not aligning with source segment 34.

Ideally I would like target segments 36 and 37 to be a single segment and correspond to source segment 34.

To my understanding I need to create an exception to whatever segmentation break rule Studio is applying here.

What is not clear to me is: What segmentation break rule is Studio exactly using here?

Here's a capture of the original target text:

Close-up of text in Trados Studio highlighting the segmentation break at the end of a sentence with a closing quotation mark followed by an opening bracket.

Is Studio doing a "closing-quotation mark segmentation break"or an "opening-bracket-segmentation break"?

Or am I asking myself the wrong question?

When I open the dialog box for segmentation rule, I see rules for the following break characters:

- period,

- question mark,

- exclamation mark,

- colon,

- semi-colon, and

- tab

See the foolowing screen capture: Trados Studio Edit Segmentation Rule dialog box showing a list of break characters including period, question mark, exclamation mark, colon, semi-colon, and tab.

It seems to me that the segmentation break is performed either after a closing quotation mark, before or after a regular space

or before an opening bracket.

None of those 3 characters are part of the list of available character breaks.

Essentially, I transcribed the string surrounding the segmentation break "players." (LSSS" as follows:  \w+\."\s\(w+

and fiddled with that regex in multiple ways in the "Before break" field and in the "After break" field... all with resounding failure!

I have dozens of those breaks in that long text... and I have more than 10 of those texts with similar breaks to align!

Can somebody clarify for me how Studio "is thinking" to perform the segmentation break here in the case presented...

and also provide a solution to my problem.

Thanks in advance.



Generated Image Alt-Text
[edited by: Trados AI at 3:42 PM (GMT 0) on 28 Feb 2024]
emoji
Parents
  • Hello  

    Perhaps you are just looking at this the wrong way.  You have two sentences, something like this:

    FR
    À cette fin, un peu plus de texte ici, avec les autres intervenants du milieu » (LSSSS, art. 100).

    EN
    To that end, some more text here, with other key players." (LSSSS, sec. 100).

    The problem you have is that the FR source doesn't have a period before the closing quote so this is why it does not segment.  The EN target however does.  So if you add an exception to the fullstop rules in the EN language resources of your TM then you can prevent it from segmenting.  Something like this perhaps:

    Trados Studio Edit Segmentation Rule dialog box showing an exception added for full stop rule when followed by a quote.

    Then the alignment looks like this:

    Trados Studio alignment view showing French to English translation with mismatched segmentation due to punctuation difference.

    Which is what you were trying to achieve... I believe.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 3:42 PM (GMT 0) on 28 Feb 2024]
Reply
  • Hello  

    Perhaps you are just looking at this the wrong way.  You have two sentences, something like this:

    FR
    À cette fin, un peu plus de texte ici, avec les autres intervenants du milieu » (LSSSS, art. 100).

    EN
    To that end, some more text here, with other key players." (LSSSS, sec. 100).

    The problem you have is that the FR source doesn't have a period before the closing quote so this is why it does not segment.  The EN target however does.  So if you add an exception to the fullstop rules in the EN language resources of your TM then you can prevent it from segmenting.  Something like this perhaps:

    Trados Studio Edit Segmentation Rule dialog box showing an exception added for full stop rule when followed by a quote.

    Then the alignment looks like this:

    Trados Studio alignment view showing French to English translation with mismatched segmentation due to punctuation difference.

    Which is what you were trying to achieve... I believe.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 3:42 PM (GMT 0) on 28 Feb 2024]
Children
  • Hello Paul,

    Thanks once again for your reply... but it doesn't work for me.
    I scratched my head a couple of times, because I'm just so sure that when you give an answer

    it just is the right answer.

    So, I just started poking around and I found the problem... which I can't solve.

    Here it is:

    I created the exception to the segmentation rule in L2, exactly as you instructed,, opened the two files for alignment and Studio

    still kept breaking the segment like before.

    I went back to the "Edit rule Exception" dialog box in BASIC view and it looked fine.

    Screenshot of Trados Studio's 'Edit Rule Exception' dialog box in BASIC view showing an incorrect regular expression 'w+..+ ' in the 'Before break' field.

    I then clicked on the ADVANCED view button, and this is where there is a problem:

    Screenshot of Trados Studio's 'Edit Rule Exception' dialog box in ADVANCED view with an erroneous regular expression 'w+..+ ' auto-generated by the software.

    For some reason, Studio created a different regex on its own.

    I corrected it and entered the right expression and clicked OK a couple of times to return to the alignment view.

    I then proceeded to to the alignment again and sure enough, it was still incorrect.

    I went back to the "Edit rule Exception" dialog box in BASIC view and it looked fine again. I toggled again to the ADVANCED view

    and there was again the same erroneous regular expression that Studio created on its own:  " \w+\.\.+ "

    I did this four or five times, and it was the same problem again and again.

    I tried different scenarios where I would have the incorrectly aligned files open/active, then unopened.

    The same for the TM concerned, I tried having it opened, then unopened during the procedure.

    My last troubleshooting idea was to go back to the "Edit rule Exception" dialog box in BASIC view and enter XXXXX

    in the "Before break" and check the "regular expression" box and the same for the "After break".

    I then toggled to the advanced view and here's what Studio did:

    Screenshot of Trados Studio's 'Edit Rule Exception' dialog box in BASIC view where the user has entered 'XXXXX' in both 'Before break' and 'After break' fields with 'regular expression' checked.

    So, this is as far as I can go on my own. I just can't solve it.

    As usual, Paul, any help would be much appreciated,

    Christian

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 3:43 PM (GMT 0) on 28 Feb 2024]
  • Two things to try I guess:

    1. Don't waste time on the basic view. Always better to use the advanced view and just use the rules I showed

    2. Make sure you are applying this to the right language.

    3. Perhaps your text isn't the same as the one I made up.  It helps to have the real text and not just a screenshot as I may be missing something.

    Studio has two languages in the TM and each language has it's own segmentation rules.  Normally we only play with source language since this is the file that gets opened.  Alignment is different since we open a source and a target language.

    Maybe test the attached TM as an example:

    fr-en.zip

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub



  • ok - on testing a little more I think I see your problem. In fact it's not really your problem it's a Studio bug. If you check that the expression you created was correct Studio screws it up again! The TM I gave you will also look wrong if you check it!

    So do this:

    Go to the advanced option for your rule in the EN language and edit it. It will probably look like this now:

    \w+\.\.+

    Edit it to be this:

    \w+\."

    Click OK the four times you need to and then align the files. This time it should work. But if you open the rule again Studio will break it.

    I'll log this with the support team so it gets reported to development as I'm not sure this has been logged already.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hello Paul,

    Sorry for the delay. Had to take Maple, our Irish Setter, for his first morning walk.

    OK, back to business:

    I tried what you suggested, but it doesn't work. The same problem persists.

    Here's a screen capture to show that I have followed your instructions and entered the regex you provided:

    Trados Studio screenshot showing regex entered in the 'Search for' field with a warning message indicating an unrecognized escape sequence.

    Then I went to Advanced view:

    Trados Studio Advanced view screenshot with a warning message about an unrecognized escape sequence in the regex pattern.

    Again, Studio modified the regex on its own.

    I also tried a second time and , out of curiosity, I toggled back to the Basic view without having clicked OK,

    and Studio also modified the regex in the Basic view:

    Trados Studio Basic view screenshot where the regex has been modified automatically, displaying a warning about a possibly invalid regex.

    I have also attached the L1 (French) and the L2 (English) files for you to experiment with,

    if you want to.

    L1-L2 used for alignment_Christian Blouin.zip

    Thanks for your help.

    Christian

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 3:43 PM (GMT 0) on 28 Feb 2024]


  • You are not following my instructions... probably I wasn't clear enough. Try this with a clarification to it:

    Go to the advanced option for your rule in the EN language and edit it. It will probably look like this now:

    \w+\.\.+

    Edit it to be this IN THE ADVANCED RULE DIALOGUE:

    \w+\."

    Click OK the four times you need to and then align the files. This time it should work, BUT DO NOT OPEN THE RULES AGAIN BEFORE USING THE TM. If you open the rule again Studio will break it.

    If it doesn't work this time I'll do a video as I just got back to my office!

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • ok - I just played with your file and I see a reason why this isn't working for you.  Your file uses a curly quote and not a straight one.  So you need to make sure that the segmentation rule uses a curly quote too.  I'd suggest you copy/paste it from your file.  I just did a quick test with an extract from your file and now get this so it definitely works:

    Screenshot of Trados Studio showing a comparison of text segments with a focus on the use of curly quotes versus straight quotes.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 3:43 PM (GMT 0) on 28 Feb 2024]
  • Indeed, Paul, the curly quote copied directly from the text into the regular expression solve the problem.
    Thank you for persisting.