SDL MultiTerm 2017 Extract

Hi,

 

I would like to know the Methodology / Technique of how SDL extract the terminology? Is it based on the repetition only? 

 

Best Regards,

Samar

  • Hi

    Source language term extraction is essentially based on frequency of words and word sequences, which is put into a formula to give them a score and this determines whether they are seen as a term candidate.  Essentially it favors longer terms over shorter ones and in addition to that there are some heuristics, stopwords are extracted, acronyms are recognized etc.

    For bilingual files the identification of the translation of term candidates is a twofold process:

    1. Computation of word alignment probabilities, essentially also based on frequency of co-occurrence (similar to what is done for uplift)
    2. Word alignment of the input corpus to identify the most likely translations of source terms, also similar to what subsegment matching does in uplift

    Thanks to Erik for the explanation... and I hope this is what you were after?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub