Translation memory management: duplicate TUs or overwritten TUs

Hello,

I manage the TMs and translation resources for my company. When I import a TU with a custom set of fields (filename, category, subcategory, translator, date entry, native check) and then import the exact same TU from a different project with a different set of fields, the original TU is overwritten. Instead of having 2 TU's I just have 1. This is problematic. If one TU is used for one manual, then overwritten with info from another manual, the reliability of the TU is put into question even though it could have been used serval times.

Is there any way to stop this from happening? I would like two TU and not like old TUs to be overwritten. 

As of now, the only way I know how to NOT override an old TU is to import the new TU into another TM. I split our master TM into 3 TMs for our main divisions and this sorta helped. Another workaround is to customize the field settings of the TU to allow for multiple values. That way I could keep track of where the TU was used (useful for manuals or say makings sure that speech from the president of company stays associated with the president). But this would lead to really huge "filename" fields for me and would potentially take an incredibly long time to initiate because this setting can only be done from a fresh new TM... right? How would I do this for a TM with a few hundred projects and several 10s of thousands of TUs?

So anyway, if anyone has a good translation memory management method or tips, especially for in-house translators, I'd love to hear yr thoughts.

Best regards,

Keenan

Parents Reply Children
  • What I don't get is how this process works if translator works completely "independently", i.e. works on translation package (i.e. works either with Project TM, or with local separate copy of Main TM).
    Then the return package contains only the SDLXLIFFs... so is this "Add as New Translation" flag remembered somewhere in the SDLXLIFF, so that when engineer runs the Update Main TM batch task on his side (i.e. completely separately from translators!), the TUs are written to TM exactly as the translator originally intended?!
  • Ms Matefi,
    Thank you for your response. However, this only works if adding units via translation. A large amount of our TM data comes from imports from data recieved from translation companies or from alignments of past documents.
  • Also curious as to if the method of adding to a translation memory via "update main translation memory task". In the options for that batch task, there is an option for "Add new translation units" but that's only "if target segments differ"... what about when the target is the same? Would be so easy if that option existed to ALWAYS add a new unit, regardless of whether the target is the same or different. Interest to hear if  has any thoughts on this.

  • Hi ,

    I am a little confused by this thread. Can you provide an example of where you would want duplicate TUs in your TM? If they are true duplicates (same source, same target, same context) then you would not want duplicates at all and reliability should not be in question. It would be helpful if you could provide a sample TM with your fields, and sample TMX with different fields and a small text that demonstrates why this is a problem.

    At the moment it feels too theoretical for me. I played around with some TMs and importing myself and the more I play with it the more I don't see why you want duplicates if the fields are not being used to distinguish between TUs in the import. It's actually incredibly hard to create a true duplicate in a Studio TM.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Paul,
    Here's an example of why having duplicates improves reliability. 

    Say I have this data in one TM:

    TU#1 Source: これはチラシです。Target: This is a pamphlet.
    TU#2 Source: これはチラシです。Target: This is a flier.

    If you were to come across "これはチラシです" while translating, the TM result window would show both of the above TUs with a -1 penalty. They'd both have a 99 score. The top one is only there because it was imported/added most recently, right? That's the only thing separating these two. So how do I know which one is correct? But what if say TU#2 had actually been imported/added a dozen times with the same source/target from a dozen different manuals with unique fields that when imported all just overwrote the previous TU? And say TU#1 was only used once. You'd like to know that a certain unit has been used many times with a TM right? Knowing this would improve reliability. Even better if there were field attached so you could see what manual this TU was used. So having this as an option when importing would be great. As said above, the option exists when translating to add a new TU, but this option doesn't exist when aligning/importing/batch updating main translation memory. If something had a different field value, I'd like a new TU to be created even if the source and target are the exact same. If I wanna know which translation was used for manual XX, I'd have no way of knowing because all critical field values are overwritten. The only way, as of now, to do this is to have a different TM for every manual, which gets outta hand. Or to set the TM to allow for duplicate values... but this would make huge, space-eating field values, plus would make me have to reimport all the TM data. 
    See what I'm getting at?

  • Hi ,

    But in this scenario they are not duplicates, only the source is duplicated and if you use the option when importing to add as new translation if the target segments differ then you will get two and the new one will also retain the different fields. So when translating you would see a duplicate translation penalty but you'd also see the fields and could make your own mind up which one was correct.

    Furthermore the TU carries a context with it, so if the TU was created in a file with a different context then Studio should be able to distinguish between the results and not give you the penalty. If they are both the same context then I think you should be questioning why you have duplicates in there in the first place.

    Maybe I'm not understanding your problem properly, but I can import without overwriting in this scenario. The only problem I can't resolve is knowing which is the correct translation when the context is the same. But I think that's a challenge all users have in ensuring that their TMs contain accurate information, especially when working as a project manager with multiple translators who don't get to reference your main TM when they work.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi  

    Unknown said:
    Even better if there were field attached so you could see what manual this TU was used. So having this as an option when importing would be great.

    You can do this and then even import based on the fields if you like:

    You might also find this app useful as it can ensure you never forget to set the right field for recording filenames:

    https://multifarious.filkin.com/2016/01/14/recordsourcetu/

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Paul,

    "But in this scenario they are not duplicates, only the source is duplicated and if you use the option when importing to add as new translation if the target segments differ then you will get two and the new one will also retain the different fields."

    In the scenario I presented, the source is the same and the target is the same. Nothing differs except the origin of the unit. That's what I need to keep. Constantly having the same TU overwrite itself over and over is no help when knowing the origin defines how a TU is used. This option would be really valuable for the type of translation we do. 

    "If they are both the same context then I think you should be questioning why you have duplicates in there in the first place."

    Not quite. Same source and target but the context is different. A different file. As I explained before, having duplicates is beneficial because knowing the source of the TU is important. Here's another example. Say from an empty TM, I imported File #1 with 500 TUs. Then I imported File #2 with 500 TUs but 300 TUs were the exact same as File #1. So those 300 TUs from File #2 overwrote 300 of File #1's TUs. Right? So I only have 700 TUs when I actually need 1000. Then say I want to make a TM with just the TUs from File #1. If I filtered them out, I'd only have 200 TUs which is incomplete. This is not theoretical. This is something I will be doing but I risk having incomplete data because of how I didn't have control of how data was handled. 

  • Hi  

    Unknown said:
    Same source and target but the context is different. A different file.

    In this case shouldn't you be using fields and attributes where multiple values are not allowed?  If the only reason you want a 100% duplicate where source and target are exactly the same is for counting TUs then you need to force the software to not make the default choice.  In translation terms it makes no sense since the results will be the same with either seeing as the context doesn't change the translation at all.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Paul,

    I have robust fields (category, subcategory, file name, translator, native check, date added) and this only helps when the target is different. If the target is the same, the field values get overwritten by those of the most recent import. As I said in my first post, if I changed the TM settings to "allow multiple values" then, as indicated by the hazard icon, it would delete all the fields there to begin with which would be a disaster. (My UI settings are in Japanese but you get the idea)

    "If the only reason you want a 100% duplicate where source and target are exactly the same is for counting TUs then you need to force the software to not make the default choice."

    As I explained several times, that is not the reason I want this option. Specifically, see the very non-hypothetical scenario I explained in my last post. Also, from an earlier post, seeing multiple duplicates with different fields does make sense and does affect the translator and translation. Seeing 10 occurrences of the same source-target verses 1 instance but with a different target, helps the translator choose the right one.
    Also, I don't know what you mean by "forcing the software to not make the default choice."