Strange TM match numbers with 2 TMs

I am getting strange TM match numbers when I use 2 TMs.
When I add a second TM, the number of fuzzy matches increases and the number of 100% matches decreases.
I was under the impression that adding a TM could only improve my matches.
For example, here are the match figures for the job I am currently working on.
Here are the figures with one TM:
I then add the second TM (with a penalty of 3), and get the following match figures:
Note that there are 19 segments in the 95%-99% range with 1 TM and 171 segments in this range with 2 TMs.
If I update the project TMs (using both TMs) and then rerun the analysis with just one TM, I get the following:
Before the project TMs were updated, there were 17 segments in the 95%-99% range (1 TM) and there are 4 segments in this range after updating (1 TM).
If I add the second TM, I see the following analysis:
There were 171 segments in the 95%-99% range (2 TMs) before the project TMs were updated and there are 134 segments in this range (2 TMs) after the update.
In summary:
1 TM, before update: 19 segments in 95%-99% range
1 TM, after update: 4 segments
2 TMs, before update: 171 segments
2 TMs, after update: 134 segments
There are two mysteries:
1. Why does adding another TM increase the number of fuzzy matches (decreasing the number of 100% and context matches), both before and after the project TMs are updated ?
2. Why does updating the project TMs change anything? I thought the project TMs were continuously updated during the translation.
Anyone have an idea about what is happening? I have lost a bit of my confidence in the matching algorithm being used.
Regards,
Bruce Campbell
A S A P Language Services
Parents
  • Hi Bruce,

    That can happen due to the penalty for multiple translations. In other words, if you have a 100% match from one TM and only that TM is added to your project, then it will be counted as a 100% match, but if you have two different TMs, each with a different 100% match for the same segment, then each of those will be counted as a 99% match (assuming a 1% penalty for multiple translations) when both TMs are active in the project.

  • Hi Nora,

    That sounds like a good explanation -- except for the 3 point penalty assigned to the second TM.

    The 3 point penalty on the second TM should avoid the two 100% matches and the multiple translations situation. The segment from the second TM would be a 97% match after the TM penalty is applied.

    Unless the segments are being compared BEFORE the TM penalty is applied.

    But I cannot imagine this is what is happening, as it would make the TM penalty kind of useless.

    Still a mystery to me ...

    Bruce Campbell
    A S A P Language Services
  • Hi Nora,

    Here is another mystery.

    I thought that updating the project TMs would update BOTH project TMs with all of the translated segments in the file.

    After updating project TMs, here are the match figures if I analyse using only the first TM:

    Here are the match figures if I only use the second TM (using a penalty of zero for the second TM this time):

    If both project TMs were updated, then why are there 68 new segments when only the first TM is used, and 95 new segments when only the second TM is used (with zero penalty for the second TM this time) ?

    The difference of 25 segments can only be due to some of the Context Match segments no longer being recognised -- but why not?

    The only explanation I can think of is that the "Update project TMs" function is not updating BOTH of the TMs with ALL of the translated segments in the file.

    Then one has to ask whether the "Update master TMs" function updates BOTH of the TMs with ALL of the translated segments. More mysteries ...

    Best,

    Bruce

Reply
  • Hi Nora,

    Here is another mystery.

    I thought that updating the project TMs would update BOTH project TMs with all of the translated segments in the file.

    After updating project TMs, here are the match figures if I analyse using only the first TM:

    Here are the match figures if I only use the second TM (using a penalty of zero for the second TM this time):

    If both project TMs were updated, then why are there 68 new segments when only the first TM is used, and 95 new segments when only the second TM is used (with zero penalty for the second TM this time) ?

    The difference of 25 segments can only be due to some of the Context Match segments no longer being recognised -- but why not?

    The only explanation I can think of is that the "Update project TMs" function is not updating BOTH of the TMs with ALL of the translated segments in the file.

    Then one has to ask whether the "Update master TMs" function updates BOTH of the TMs with ALL of the translated segments. More mysteries ...

    Best,

    Bruce

Children
  • Hi Bruce,

    I've just stumbled upon this issue today while investigating why a filebased memory and a TMServer memory with the same content/settings produce different stats. It surprised me somewhat. The intuitive way (for me) would be to only apply the multiples penalty after the memory penalty, so at least we'd see a 100% match from memory 1. Also, can anyone explain why the multiples penalty affects context matches? Setting this to 0 increases the CM count. Perhaps there are multiple segments with the same context? If so, I'd prefer to always see these as 100% - multiples penalty.

    Regards
    Alan
  • Hi Alan,

    I have to admit I find the analysis figures a bit non-intuitive. I still don't know what to think about the differences we have found, but I doubt SDL is going to do much about it.

    On a similar note, I often receive pre-translated files from agencies and find it annoying when the translated segments are lumped into the analysis, sometimes showing up as 100% matches, sometimes context matches, sometimes as fuzzy matches.

    Since the only segments than can be excluded from analysis are locked segments, I generally now lock all translated segments before doing an analysis.

    That way I don't have to switch back and forth between the File view and Reports view and try to figure out how to subtract the translated segments from the analysis figures.

    I also end up with an analysis that shows me exactly what work still has to be done (which is usually why a translator does an analysis, in my opinion).

    I also change the status of some segments to "Translation approved" while translating large files, so that they are not included in the figures shown for "Translated" segments in the File view and I can work on batches of segments.

    Here is what I do when translating large files:

    1. Use the "SDL Toolkit" add-in to change the status of pre-translated segments with a status of "Translated" to a status of "Translation approved" and lock them. (Trying to do this in the editor by selecting the "Translated" segments with a filter, then changing the status and locking them takes forever with a large file. It can be significantly faster to save the file, use SDL Toolkit and then reopen the file.)

    2. Perform an analysis (exclude locked segments). If the analysis indicates a lot of fuzzies, perform a pre-translation
    3. Open the file and translate any pre-translated fuzzies.
    4. Translate a batch of segments. Save the file and use File view to check how many words/segments have been translated. (Only the batch of segments and any translated fuzzies will show up as "Translated", since the other segments were set to "Translation approved".)
    4. When a suitable number of segments have been translated, perform a spellcheck (exclude locked segments)
    5. Perform a verify (exclude locked segments)
    6. Save and close the file.
    7. Use SDL Toolkit to change the status of "Translated" segments (i.e. the batch of segments and any translated fuzzies that were just spellchecked and verified) to "Translation approved" and lock them.

    8. Repeat steps 2 to 6 analysing and pre-translating when appropriate.
    9. When finished, use SDL Toolkit to change the status of the segments that were set to "Translation approved" back to "Translated" and unlock them. (Once again, trying to do this in the Editor with a filter can take forever for a large file.)

    This allows me to use File view to keep track of how much I have accomplished in the latest batch of translation, and the analyses consistently just show how much work still has to be done (without including translated segments variously as 100% matches, context matches and fuzzy matches).

    There are still problems with analysis figures when using more than one TM, but I think a lot of the problems are due to the multiple matches Nora mentioned. In addition, if a date or number in a segment differs from the segment stored in the TM, you get a fuzzy match (if you have a penalty for auto-localisation).

    A box to exclude "Translated" segments (and boxes for "Translation approved" and "Signed off" segments) from the analysis would be nice, but I guess we have to live with what Studio offers.

    Locking the segments in batches as you go along thankfully takes translated segments out of the analysis (including the fuzzy segments due to auto-localisation and multiple matches).

    If you want to handle the multiple match fuzzies due to multiple TMs and the autolocalisation fuzzies in one fell swoop, pre-translate down to 99% (or maybe 98%) and then quickly handle the fuzzies one-by-one with a filter to display only draft segments, or use the "Go To" command (Ctr-G) to move to the next draft segment.

    Sorry I got a bit off topic there ...

    Regards,
    Bruce Campbell
    ASAP Language Services