Translation memory management: duplicate TUs or overwritten TUs

Question

Hello, 
 I manage the TMs and translation resources for my company. When I import a TU with a custom set of fields (filename, category, subcategory, translator, date entry, native check) and then import the exact same TU from a different project with a different set of fields, the original TU is overwritten. Instead of having 2 TU's I just have 1. This is problematic. If one TU is used for one manual, then overwritten with info from another manual, the reliability of the TU is put into question even though it could have been used serval times. 
 Is there any way to stop this from happening? I would like two TU and not like old TUs to be overwritten. 
 As of now, the only way I know how to NOT override an old TU is to import the new TU into another TM. I split our master TM into 3 TMs for our main divisions and this sorta helped. Another workaround is to customize the field settings of the TU to allow for multiple values. That way I could keep track of where the TU was used (useful for manuals or say makings sure that speech from the president of company stays associated with the president). But this would lead to really huge "filename" fields for me and would potentially take an incredibly long time to initiate because this setting can only be done from a fresh new TM... right? How would I do this for a TM with a few hundred projects and several 10s of thousands of TUs? 
 So anyway, if anyone has a good translation memory management method or tips, especially for in-house translators, I'd love to hear yr thoughts. 
 Best regards, 
 Keenan

Paul Filkin · Answer

Hi Keenan Cooper , 
 
I am a little confused by this thread. Can you provide an example of where you would want duplicate TUs in your TM? If they are true duplicates (same source, same target, same context) then you would not want duplicates at all and reliability should not be in question. It would be helpful if you could provide a sample TM with your fields, and sample TMX with different fields and a small text that demonstrates why this is a problem. 
 
At the moment it feels too theoretical for me. I played around with some TMs and importing myself and the more I play with it the more I don't see why you want duplicates if the fields are not being used to distinguish between TUs in the import. It's actually incredibly hard to create a true duplicate in a Studio TM.

Paul Filkin · Answer

Hi Keenan Cooper , 
 
But in this scenario they are not duplicates, only the source is duplicated and if you use the option when importing to add as new translation if the target segments differ then you will get two and the new one will also retain the different fields. So when translating you would see a duplicate translation penalty but you'd also see the fields and could make your own mind up which one was correct. 
 
Furthermore the TU carries a context with it, so if the TU was created in a file with a different context then Studio should be able to distinguish between the results and not give you the penalty. If they are both the same context then I think you should be questioning why you have duplicates in there in the first place. 
 
Maybe I'm not understanding your problem properly, but I can import without overwriting in this scenario. The only problem I can't resolve is knowing which is the correct translation when the context is the same. But I think that's a challenge all users have in ensuring that their TMs contain accurate information, especially when working as a project manager with multiple translators who don't get to reference your main TM when they work.

Paul Filkin · Answer

Hi Keenan Cooper 
 Unknown said: Even better if there were field attached so you could see what manual this TU was used. So having this as an option when importing would be great. 
 You can do this and then even import based on the fields if you like: 
 
 You might also find this app useful as it can ensure you never forget to set the right field for recording filenames: 
 https://multifarious.filkin.com/2016/01/14/recordsourcetu/

Paul Filkin · Answer

Hi Keenan Cooper 
 Unknown said: Same source and target but the context is different. A different file. 
 In this case shouldn't you be using fields and attributes where multiple values are not allowed? If the only reason you want a 100% duplicate where source and target are exactly the same is for counting TUs then you need to force the software to not make the default choice. In translation terms it makes no sense since the results will be the same with either seeing as the context doesn't change the translation at all.

Paul Filkin · Answer

Unknown said: Also, from an earlier post, seeing multiple duplicates with different fields does make sense and does affect the translator and translation. Seeing 10 occurrences of the same source-target verses 1 instance but with a different target, helps the translator choose the right one. 
 Hi Keenan Cooper 
 ok... can you elaborate on this for me please as I'm clearly missing something here. If the source and target results are all the same then why does it matter which one the translator picks?

Paul Filkin · Answer

Hi Keenan Cooper 
 I spent a little time playing with this today after sending six packages out to different people so I could reproduce this as accurately as possible. I do have different context in the TU because I was getting confused with all the duplicates but you can hopefully still see what is happening. First, this is what I get if I update without any fields at all, as expected: 
 
 The F11 and others are treated as variables so even though my source for the manual names are different I only get one TU. Then as expected, and as you explained, I get two TUs for the two segments we are talking about. Five of them translated the same way, one different. 
 If I use fields, set up like this: 
 
 I used a picklist to control what I was getting, but you could use any type of field you like. The important part is that I'm allowing multiple values. When I update the TM with the returned packages and apply the appropriate update value into the fields then I get this in my TM: 
 
 So still four results but the fields values are all added. The effect this has in the files themselves would be this: 
 
 So I can see which one to choose based on the field name. Doesn't this approach work for you?

Paul Filkin · Answer

Hi Keenan Cooper 
 Thanks for your email, and your patience, explaining to me again what your problem is. I'm going to summarise the situation so it's clear (for me at least): 
 
 you need to be able to differentiate between duplicate translations based on your field values 
 you have many field values in use, all of which are important, so using "allow multiple values" is not a solution for you because there would be too many in the TM results pane to make it possible to use them during translation 
 you could apply filters to the TMs based on a field value so you only get results from the field you want, but this is not a solution for you either 
 using multiple TMs is not a solution for you because the management of so many TMs is impractical 
 what you would like is the ability to update and create duplicates during import 
 
 Currently, as you know, it is very difficult by design to create a true duplicate in a Studio TM. Even if you go to TM Maintenance and manually create one by copy pasting over another you'll find Studio removes one automatically. So making the change to allow this would be an enhancement, and probably not a simple one either. It is also very likely to cause duplication where you don't want it because it would probably be an all or nothing approach and this would add unwanted maintenance to your TMs. Maybe an option to suggest is to add new translation units if the field values differ and perhaps that would give you the desired effect? 
 TM updates is quite a complicated area and all changes need to be very carefully thought through, however, I think the best approach is for you to log this request in the ideas site and the product management team can pick it up and decide whether it's something they wish to introduce based on their views and the amount of support you get. 
 I hope this reflects the situation. Sorry it took so long to get there too!

Trados Studio > 1. Trados Studio

Translation memory management: duplicate TUs or overwritten TUs