Segmentation problem

How do I fix the following segmentation problem: In Icelandic m.v. abbreviation is often followed by a date e.g. 'Ef misbrestur verður á greiðist sérstakt fatagjald kr. 12,53 á hverja klst. (m.v. 1.5.2015).' Although m.v. is in the abbreviation list it segments incorrectly after the v. when followed by a digit.

Parents
  • Thanks for replying Paul.

    1. What version of Studio are you using?  

    SDL Trados Studio 2019 Freelance

    2. Are you adding the abbreviation after the project has already been created?

    No, m.v. has been in abbreviations for some months. My practice is to add any new abbreviations on the fly.

    3. Have you got a lot more abbreviations in the list that might be clashing?

    I have many abbreviations so there could be a conflict but I suspect it relates to ordinal followers in some way.

    4. Can you share what you see?

    From the same document I see that the segmentation fails when m.v. is followed by a digit but not when followed by a letter. See the examples below:

    Segmentation incorrect:

    Screenshot showing incorrect segmentation in SDL Trados Studio with the abbreviation 'm.v.' followed by a digit, highlighted with a red box around the error.

    Segmentation correct:

    Screenshot displaying correct segmentation in SDL Trados Studio where the abbreviation 'm.v.' is followed by a letter, with a red box indicating the successful segmentation.

    I am doing a study on ways to optimise the quality of TMs as eventual resources for machine translation development for "less well-resourced languages", e.g. Icelandic, with only a 300k+ language community. Part of this work has been to log any issues, such as the "m.v." problem that might impinge on TM quality. I have been logging since October 2018. I could send you more information on the study if you are interested. 

    I was going to copy the abbreviations and ordinal followers from the raw TM but it is too big (100.000 segments) to open in my text editor. Is there a way to get these lists directly from Trados?

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 4:11 AM (GMT 0) on 5 Mar 2024]
  •  

    Unknown said:
    I have many abbreviations so there could be a conflict but I suspect it relates to ordinal followers in some way.

    I don't think so.  If you start with a clean TM and add one abbreviation for m.v. then all the examples work.  So unless you have also been adding ordinal followers and created a problem there I think it's more likely to be something in the abbreviations list.

    Unknown said:
    I am doing a study on ways to optimise the quality of TMs as eventual resources for machine translation development for "less well-resourced languages", e.g. Icelandic, with only a 300k+ language community. Part of this work has been to log any issues, such as the "m.v." problem that might impinge on TM quality. I have been logging since October 2018. I could send you more information on the study if you are interested. 

    Sounds interesting... would like to read a bit about this if you can share.

    Unknown said:
    I was going to copy the abbreviations and ordinal followers from the raw TM but it is too big (100.000 segments) to open in my text editor. Is there a way to get these lists directly from Trados?

    Unfortunately not out of the box. You may be able to get them out of the tables in the SDLTM with a SQLLite editor.  I may come back to you on this...

     

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thanks again Paul. I have added ordinal followers, but I think it would be most productive if I sent you more information when review my log to see what issues are outstanding, and when I have made lists of my ordinal followers and abbreviations. I would be happy to share information with you about the study. Should i post it to this forum or send it separately?
Reply Children
No Data