How does Fuzzy match repair exactly work?

Hi all,

I have a question about the mechanics behind the fuzzy match repair in Studio 2017. I have been playing a bit with this feature and to me it seems pretty inconsistent at times. Sometimes a fuzzy match repair is applied to a segment, sometimes not. To me it seems that if one word in different and needs to be repaired, it will occur as fuzzy match that is repaired, but if two words need to be repaired, I just get a fuzzy match and it is not inserted in the Editor window. In both cases, the words that need repairing, are present in the TM.

Here I have an example, where I made the sentences: This road crossing had traffic lights installed last week. & Traffic lights were installed on this pedestrian crossing last week. But somehow there is no fuzzy match, let alone a fuzzy repair. However, this might be due to the match not being high enough (I set the minimum match value for fuzzy matches at 70%). Maybe because I changed the order of the sentence in the source, it doesn't recognise this as being a high fuzzy match.

Another question about the fuzzy match repair involves situations when a term is repaired. I made a test file for myself, linked a big TM to it (to have a minimum of 5000 TUs to be compatible with upLIFT) and tried some sentences. Now, the example sentence I made was: Amsterdam is the capital of the Netherands. In the TM, I made sure there was a sentence that said: London is the capital of the UK. Both Amsterdam and the Netherlands were already present in the TM.


This is the screenshot with the fuzzy match. For some reason it did not repair VK into Netherlands.

Of course, this might be due to the settings I use, but it would always be nice to learn more about the processes on the background and the mechanics behind this feature to better understand it and to be able to tweak it better to your own liking.

  • Hi Mark,

    I'm definitely not qualified to give you a comprehensive response to this but I'll try and explain some guiding principles that hopefully illustrate the complexity of this.

    The fuzzy match repair logic generally has to go through two steps. First of all it has to be able to find an alignment, or partial alignments, between the fuzzy TU source content found in your TM and the corresponding target content. So in your example of "Amsterdam is the capital of the Netherlands" it has to be able to recognise that perhaps something like "London is the capital" and "Londen is de hoofdstad" are aligned as a fragment in the TM. It also has to be able to recognise that "of the UK" and "van het VK" are also aligned.

    The second step is that it has to be able to find a replacement translation for the content that is different. So this means it has to similarly know that "Amsterdam" and "Netherlands" translate to "Amsterdam" and "Nederland". If it knows this the task is easy.

    However this only sounds easy because we are cleverer than a computer! In fact to be able to do this correctly it can probably convert "London is de hoofstadt" to "Amsterdam is de hoofstadt" but when it comes to next part things have changed a little. Now we're looking at "van het VK", but not "van het Nederland" as it would be "van Nederland"... I think! So if this represents what was actually found and aligned in the TM then it might not be able to repair the second part as there may not be a suitable partial alignment pair in the TM that the repair logic can use to confidently fix this.

    Now, don't take this as gospel. I'm just trying to explain the complexity of the aligning process that is trying to give you a confident fuzzy repair. The process may not actually be aligning the fragments I suggested... this will depend on the content of your TM overall and whether it was able to figure this out. It is statistical and is not using MT for example to figure out what things should be... so there is plenty of room for things to not work as you think sometimes.

    Maybe you can play with some other options, so add some information into your Termbase, or use MT, and see how the results differ. Maybe if gets a little time he can do a better job and probably put me right here, but all I can do is confirm that it's not easy to manufacture an expected result, particularly as you look for more complex repairs... even if our brains think it's easy.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,

    Thanks so much for your comprehensive explanation. I already expected it to work a bit like this (as you explained). Should also have mentioned that I have the minimum match value at 70%, though I am not exactly sure how important this setting is for creating a fuzzy match repair. I can imagine that "of the UK" would have been correctly repaired to Nederland when it would have been a lower percentage match (e.g. 50%). However, I still think that this would have actually resulted in a translation of "het Nederland", like you said, which of course would need additional editing to remove "het" in front of Nederland.

    I'll leave this question open for the time being then, to see if Kevin Flanagan can shed some more light on this.