How do I delete duplicates from a TM in SDL Studio 2015?

How do I delete duplicates from a TM in SDL Studio 2015??

My client sent me these instructions (below in French) for a job. Basically what she wants is to clean 2 TM’s and delete duplicates in the TMs without merging the 2 TMs. The 2 TMs have somewhat the same content and there need to be consistency between the 2 TMs.

So my question is how do I do this? How do remove duplicates from a TM, or in this case 2 different TMs. And how long do you think it takes? The TMs have 85346 words in total.

If this is not the right place to ask this question, could you let me know where I could find an answer?

 

In advance, Thanks

Kaarina

 

Instructions from my client:

Le nettoyage des mémoires

Le processus de travail serait le suivant :

-          Les 2 mémoires fournies par le client seront converties en 2 fichiers .tmx (nécessité d’avoir le plug-in File Type Definition dans SDL Studio, nous préparerions les fichiers)

-          Ces fichiers .tmx seront ainsi traités « comme une traduction », en les repassant avec les mémoires fournies (les doublons s’afficheront donc et il faudra trancher (supprimer, corriger, etc.)

Rappel du besoin : le client souhaite nettoyer la TM globale et la TM DM afin qu’il n’y ait qu’une seule unité de traduction par segment et que cette unité de traduction soit la même dans les 2 mémoires au final.

Jusqu’ici, il y avait parfois 2 possibilités de traduction, une dans la TM globale et une autre dans la TM DM. Le client veut éviter cela (veut une seule proposition de traduction).

Remarque importante : le client veut conserver ces 2 mémoires distinctes (ne pas les fusionner).

Parents
  • Hi Kaarina,

    It sounds as though your customer has already explained how you should do this.  I'll summarise in Studio speak  and we can see whether this helps ;-)

    1. The client will convert the memories to TMX for you
    2. You install this plugin which will allow you to open the TMX as a translatable file:
    3. You then create a Studio project with the TMX files, and add the TMs as Translation Memories
    4. Make sure you have this setting in place with a penalty to help you see it:
    5. Then work through the files for translation, maybe using the display filter to filter for repetitions.
    6. You can delete duplicates from within the translation results window as you go.

    Based on their request this seems to be what they are saying.  It's not the easiest way to remove duplicates, but if you have to do this between several TMs and not merge them then this is one way.  If you could merge then it would be simple as the merging process would probably remove all the true duplicates between the TMs as part of the process.

    Maybe someone else has a better idea?

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Kaarina,

    You might want to try the below process:

    Follow Paul's suggestions up until Point 3, then

    4. From your project, create an empty TM in which you uncheck all recognizer settings.

    5. Run an analysis with the below settings.

    6. This generates a new SDLXLIFF file in the Studio\Exports folder.

    7. Pre-translate this SDLXLIFF with your original TM

    8. Update the new TM you created in Point 4, and you should have the original but stripped from duplicates.

    Kindly,
    Simon

  • Hi Simon,

    I like this approach... very smart. But how does this work if you have two TMs to work with and need to ensure no duplicates between TMs? I thought the idea was to end up with two TMs and have no overlap?

    Maybe I'm confusing myself over what's required now!!

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Reply Children