Change segmentation rules when creating a new project

Hi,

Quite new to Trados Studio.

Haven't been able to find where I can add a segmentation rule so as to avoid wrong segmentation in source file.

Any help would be much appreciated.

Cheers

Roberto

emoji
Parents
  •  

    There are fundamentally two ways to drive segmentation, both of them require changes to be made before you create your project.

    1. change the segmentation rules in your Translation Memory
    2. change the segmentation behaviour of tags in your filetype settings

    What are you trying to segment on?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    I have a couple of cases where segmentation is not working properly in source file:

    1) After abbreviation "etc."

    I tried to delete "etc." from abbbreviations list in TM source language, but it's not working

    2) After a single capital letter (example: X. or W.)

    In both cases I think the best would be to add custom sentence based segmentation rules, something like

    Before break

    W

    Break characters

    . (a dot)

    Which options should I tick here? (Check abb, ordinal and punctuation)

    After break (using regular expression) 

    1 or 2 empty spaces followed by a capital letter

    Cheers

    emoji
  •  

    It would help me give you a better answer if you provided the following:

    • a full sentence
    • an explanation of the behaviour you expect
    • the filetype you are working with and if it's not something basic like Word or a text file then an example of how the sentence looks... so is it XML, or html for example

    I ask for this because the abbreviations pretty much work out of the box and as expected.  So I need to know what your specific circumstances are.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Filetype is Microsoft Word (docx)

    1) etc.

    Example:
    ...a channel, a pathway, combinations of these, etc.  The attachment portion 405 can include...

    Another example:
    a channel, a pathway, combinations of these, etc. While the example of the device or implant 

    In the first example, there are 2 empty spaces and in the second example only one.

    I would need that the sentence breaks after etc. since just after that there's a new sentence starting with a capital letter.

    I tried to delete etc. from the abbreviation list in the TM source language, but it's still not working.... 

    2) W. or Z. or X.

    Examples:

    ...is greater than the width W. The thickness T being greater than....

    ...to move in an outward direction Z. This movement of the flexible portion 1687 in the...

    ...2182 in the inward direction X. The arms 2182 are connected to each other....

    I noticed that in the abbreviations list all capital letters followed by a dot are there by default. I tried to delete them in the TM source language, but it's still not working...

    Cheers

    emoji
  •  

    ok... I used this sample file:

    Then I did this:

    I also attached my TM in case it helps:

    en-de (RJ).zip

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    Thanks a lot for your efforts!

    So I created a new project, then I

    1) Deleted both etc. entries in TM source language abbreviation list

    2) Added a custom segmentation rule in TM source language, like this

    \s\p{Lu}\.  (Lu letters between curly brackets)

    But when trying to open the sdlxliff file an error pops up

    Error dialog box in Trados Studio with a red cross icon, displaying the message 'Object reference not set to an instance of an object.' with options for Knowledge Base and Community, and an OK button.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:43 AM (GMT 0) on 29 Feb 2024]
  •  

    Solved the issue regarding the "reference not set" error (I was storing my resources files in Dropbox, and I noticed a few warnings when Trados files were opened, so I decided to move all my resources files into a laptop local folder just to be sure!)

    However the segmentation rule is not working....

    1) etc.
    ...a pathway, combinations of these, etc. While the example of the device or implant...

    2) W. X. and so on...
    ...to move in an outward direction Z. This movement of the flexible portion.... 

    emoji
  • So, I tried to create a new project this time using your segment.docx file and everything worked flawlessly... Open mouth

    Why is it not working with my source file?

    emoji
  •   

    Managed to make it work!

    However deleting the abbreviation for etc. and adding the custom segmentation rule, solved an issue but created a couple of new ones... Frowning2

    1) etc.

    After etc. is not always necessary to break the sentence. It would be necessary only if after etc. there's an empty space followed by a capital letter.

    I presume that the abbreviation etc. must be kept in the list and add a segmentation rule instead, right?

    2) Break after single ending capital letter

    The segmentation rule is working fine for single capital letters like W. Z. and so on. But after for example FIG. it breaks the sentence (I tried to add it into abbreviation list but it doesn't work either...).

    Example:

    as shown in FIG.
    166) such that the paddle

    In this case maybe it would be better to add some regex in the after break portion of segmentation rule, so as to break the sentence only if after a capital letter and a dot, there's an empty space followed by a capital letter!

    Cheers

    emoji
  •  

    ok - this is why you could help yourself a lot more by providing a comprehensive and specific example file for anyone willing to help to work with.  Not everyone will be prepared to spend time guessing what you need.

    I created a new sample file:

    I used the same TM I gave you earlier and added one exception to the full stop rule:

    Screenshot showing an exception to the full stop segmentation rule

    This resulted in this, which seems to behave as you want:

    Screenshot showing correct segmentation

    I did not need to make any changes to the abbreviations list.

    If it still doesn't work for you please provide a small sample file containing the offending sentences.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Reply
  •  

    ok - this is why you could help yourself a lot more by providing a comprehensive and specific example file for anyone willing to help to work with.  Not everyone will be prepared to spend time guessing what you need.

    I created a new sample file:

    I used the same TM I gave you earlier and added one exception to the full stop rule:

    Screenshot showing an exception to the full stop segmentation rule

    This resulted in this, which seems to behave as you want:

    Screenshot showing correct segmentation

    I did not need to make any changes to the abbreviations list.

    If it still doesn't work for you please provide a small sample file containing the offending sentences.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Children
  • It still doesn't work on my end...

    Anyway here's a sample docx file with all use cases, including a description of what I need to achieve!

    emoji
  •  

    I have no idea why it doesn't work on your end.  I suggest you refer to the video again to make sure you are applying this correctly and are always opening the file against the updated TM.  I took your sample file, did a little more of the same we have already discussed and get this which I think is what you're after:

    Screenshot showing the correct segmentation for the whole sample file provided.

    Here's my TM:

    5516.en-de (RJ).zip

    Only changes I made to accommodate your sample file were these exceptions to the full stop rule:

    1. Edited the FIG exception so it also handled FIGS:
      Screenshot showing the edited FIGS exception rule

    2. Added a new exception to handle the etc. followed by a number:
      Screenshot showing the additional etc. following by a number exception rule.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji