What is the technology behind upLIFT?

I thought it would be helpful to kick off this thread as we are seeing a few questions in different places about upLIFT and whether this is the same as Lift which was the basis of this technology when  first introduced it.

So my first question would be how many TUs are needed in your Translation Memory to be able to upgrade it for full upLIFT capability with fragment matching and fuzzy match repair?  I read that Lift could do this from a very small number of TUs yet upLIFT does seem to require a bigger starting point.

What questions do you have?

Regards

Paul

Paul Filkin | RWS Group

________________________
Design your own training!

You've done the courses and still need to go a little further, or still not clear? 
Tell us what you need in our Community Solutions Hub

Parents
  • It's true that the differences between Lift and upLIFT merit some discussion. First of all, it's helpful to distinguish between the two types of fragment match that Studio exposes settings for: 'Whole TU' fragment match, and 'TU fragment' fragment match. The first kind is illustrated by Emma's "Electroforesis capilar" example at signsandsymptomsoftranslation.com/.../ (you have "Electroforesis capilar" in the TM as a single segment, and get a translation for it later as part of a larger segment), while the second kind is illustrated by her "acerca de los riesgos y beneficios" example (you have that in the TM embedded in a longer segment, but still get a translation for it later as part of another longer segment). I've previously described these as TM-TDB and DTA subsegment matches, respectively (e.g. www.kftrans.co.uk/.../FillingInTheGaps.pdf)

    With upLIFT, 'Whole TU' fragment matches will work regardless of the size of your TM. Regarding 'TU fragment' matches, the translations are retrieved for them using fine-grained alignment of the TU content. The Lift prototype did perform fine-grained alignment of even a tiny TM (e.g. 1 TU), essentially by using external bilingual electronic dictionaries, and lemmatizers for each language. That worked pretty well, but to get that kind of functionality in SDL software, we’ve built a better approach. It turns out alignment results are better if you build translation models from big parallel corpora, then make an aligner use those instead of electronic dictionaries. We’ve done that, and we’ve got an aligner that can work like Lift did, only for more language pairs (all the pairs for we which offer MT). To begin with, we plan to provide that alignment functionality as a service. If you connect a TM to it, then you get fine-grained alignment from the very first TU you add. It’s likely we’ll have that for cloud- or server-based TMs first. More information in the coming months ...

    So, why release upLIFT in its current form, using an aligner that builds a local translation model for alignment, so needs a TM of a certain size? Mainly because it's still a great leap forward (fast, in-context fragment recall from your 'live' TM) and provides the new functionality now, regardless of language pair (though we have improved Chinese and Japanese support to be released soon). Also, fine-grained alignment is only part of the story; fragment recall also requires considerable TM engineering, which is included in this release and paves the way for future features.

    For now, then, you get full upLIFT capability if you have enough data in the TM to build the translation model (recommended 5,000 TU minimum, though you can try it with as few as 1,000 TUs). I'm hoping Studio users will be pleased with the progress, and will also keep telling us what could be better ...

  • I'm still not entirely clear on this - which may be due to the limits of my own understanding of these processes, but I'll ask my question anyway. Slight smile

    What does upgrading the TM do? It would seem to do more than adding new segments based on found fragments, because if that were all, then... well, if you have a TM with 3000 TUs, then you upgrade it, then you add another 2000 TUs, presumably an upgrade of the 5000-TU TM would give a better result. If upgrading the 3000-TU TM and adding 2000 TUs gives the same result as upgrading a 5000-TU TM, then there's something going on beyond the analysis of the TUs in the TM at the time of upgrade.

    I am presuming that adding TUs to an already-upgraded TM is as good as upgrading the whole, now-larger TM; otherwise, you'd give an option to repeatedely upgrade the TM again and again. And if that presumption is correct, then - how is it working? How are the added-post-upgrade TUs retrospectively ijncorporated into the analysis that provides (non-whole-TU) fragment results?

  • otherwise, you'd give an option to repeatedely upgrade the TM again and again. And if that presumption is correct, then - how is it working?

    You do have an option to do this. If you only ever work interactively you don't need to upgrade again and again as the UpLift process works automatically as you keep adding TUs.  But if you import another 2000 TUs you do need to run the upgrade process again:

    You can see here that I don't need to run it again.  But if I were to import a TMX to this TM then it would tell me the number of unaligned TUs and I would run it again.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Reply Children