need a specialist solution for Server TM duplicates cleanup

I am working with a very large server TM that has a number of duplicates.

I am aware of opening the TM in the Translation Memories management interface and the search Type "search in potential duplicates only", which then takes me screen by screen to all "potential
 duplicates.

Is there a way to get "100% duplicates only" (not fuzzy, as there are many from formatting, etc.), so that they can be cleaned up.

I don't know if Regex can help write the filter.

I do not know how to compose the expression.

Can anyone help?

Thank you very much in advance.

Sincerely

Susanna Miles

  • Hi Susanna,

    In the absence of any ideas from our GroupShare experts I think I'd export to TMX and then try cleaning the TMX with something like Olifant perhaps.  Have you tried this?

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thank you for all your efforts Paul.
    This is the end client's TM - I will pass the suggestion on to the agency that sends me the work. They should have more access rights.
    My attempt to export resulted in an error message: "You do not have permission to export from a TM."
    The odd thing is -
    I can open the client's server TM in the Studio 2015 Translation Memory view ->
    I can select the Search Type "Search in potential duplicates only" ->
    but the option to "Search for 100% duplicates only" does not exist. IMHO that should be so much easier for the system to find than just "potential duplicates", but apparently not so.
    I think the search can be narrowed in the filter options to the right, but that is beyond my skill level.
    I am not a programmer.
    Any suggestions are very welcome.
    Thank you so much in advance.
    Sincerely,
    Susanna