To make TMs unique

Former Member
Former Member

Firstly, in a nutshell, it works like this;

Venn diagram showing the intersection of sets A and B with the unique elements of each set labeled.

above diagram has only two elements but, mine can have Max. "9" of them (TMs)

Now, let me show you an example.

Screenshot of Trados Studio interface showing two Translation Memory (TM) entries with highlighted common Translation Units (TUs) that will be deleted to make TUs unique.

You have "2" TMs
-TM1 has TU ID 1050, 1051 and 1052
-TM2 has TU ID 305, 306 and 307

As you can see, they have the same TUs within it.
-TM1, 1051
-TM2, 305

This code deletes above two common TUs only, so makes all TUs  Unique 


Enjoy

make Unique items.zip

the first code makes a "Survival List" with MS Excel 
the 2nd code deletes "common TUs" using the above MS Excel file.

Misc.
-currently, the SurvivalList is odered by TM name and LastUsedDate, I believe you are very good at MS Excel,, right ?
-if you want to remove more TUs, just delete/remove/emptify Any one or All cell(s)/row(s) from the Survival List, too easy.
-the term "common" has a little bit wider meaning here (it considers "Characters" only)
-and the common parts are gone, all of them, perfectly. So, you have to make a way to keep existence of common parts.
-usually, I prefer to "ClickOnceBlindly" style but, all these selections/considerations/controls/options.. looks not bad, unless too much.

[NOTE]
I did not make any kind of testing
I did not considered any kind of Your crazy usages too, of course.





Generated Image Alt-Text
[edited by: Trados AI at 4:23 AM (GMT 0) on 5 Mar 2024]
emoji
Parents
  • This behaves a little strange.  I ran a few tests to see how this worked.  Worth noting the following if anyone else wants to try this:

    1. you need to create a folder that contains a copy of the TMs you are comparing
    2. run the 01 make_SurvivalList.ahk file
    3. There are no messages that anything is happenning so if you have a TM of any size just wait and eventually the spreadsheet will appear

    I also have a question over which TU does it retain?

    Test 1

    I compared two TMs with approx 50k TUs in each.  I deleted 50 TUs from one.  I expected the result to show an Excel file with 50 TUs in it.

    • took a couple of minutes and eventually the excel file appeared with 42 TUs.  I assumed that there could have been more duplcates so accepted this as working
    • The delete_TUs.ahk didn't seem to work at all

    Test 2

    I compares two TMs, one with only four TUs and another with one that matched one in the first one as I wanted to see what a duplicate was?  Same source only, same target only, same source and same target?

    • this didn't work at all and the solution reported there was nothing to find

    Test 3

    I repeated Test 1 and this time it took even longer and then produced an Excel file with all 50k TUs in it.  The 42 that were not duplicates were actually appended after a red line at the end of the 50k duplicates.  So something weird is happenning here.

    Interesting code though (like that it's in English again... almost), and I like the idea.  You can do the same thing in Studio if you use the upgrade TM route and this will not only remove the duplicates but it will create a single TM for you (If you want) with the results and it will carry out more checks on the integrity of the TM that just duplicate removal.  But I guess this script has the potential to be faster in the end if it worked and could be relied upon.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    Former Member in reply to Paul

    1.
    Thanks for your precious time to meticulously test my humble codes.
    Really appreciated.

    2.
    I have forgotten to mention that this post stems from following post;

    Is there a best practice for importing new segments into a large TM?
    community.sdl.com/.../is-there-a-best-practice-for-importing-new-segments-into-a-large-tm

    Only for this.

    3.
    I guess (rather I hope), the problem solved a little bit.

    At least, That worked as I intended/imagined.
    I have left all the evidences at main post.
    That is good enough for me

    4.
    I have deleted all the realted files in my computer (this is not my prblem)
    Nobody knows but, Making the situation from my side (it is called Reproduce) is VERY, VERY and VERY painful , silly and meaningless steps. 

    I do not want to do it again.

    5.
    Duplicated ? Who cares ?
    Forget about this codes.
    Just use your TMs.
    There is no serious performance hit absolutely - for instance, it takes an hour to find its fuzzis or something.

    So..

    Good Bye.

  • Former Member
    Former Member in reply to Former Member

    On 2nd thought
    I have changed my mind
    Because, You are the only one who knows what I am doing here (including all of the other posts too)

    So, Let me nice again....

    The key logic of this post has the same logic as I used at "To extract TM [3/3] - inconsistencies"
    I just have changed only one character

    So...

    If you have any doubt about what the heck is "Common", run the code I mentioned above.
    Then you will know it 101% clearly (ah.. at there, I have failed to give you the clear idea of mine too)
    Then that could prove that I am (not) wrong here..

    Right ?

    Good Luck To You 

  • Former Member
    Former Member in reply to Former Member

    ah, you wrote some long note
    But.. I am a little bit not interested in it now (most importantly, you are not a DamzelinDistress)
    If I change my mind, I will take care of the other sad stories you wrote

    Regards

Reply Children
No Data