insert termbase terms in NMT suggestion

Is there a setting that needs to be activated to have a matching term from a termbase inserted in an NMT suggestion; this is not happening by default; I am using the new NMT provider in Studio 2019.

Parents Reply
  • Hi and ,

    From a linguistic (and subscription) point of view this is developing beyond my level of understanding, but from what I remember learning in uni about neural networks (https://mitpress.mit.edu/books/parallel-distributed-processing-volume-1) it seems to me that a NMT approach to translation might generally not harmonize with RBMT-style glossaries. The main benefit of neural networks, as I remember, is that they work on the sub-symbolic level, and that they have a "natural" way of learning, i.e. adjusting the weights on the individual "neurons". The trade-off is their opacity.

    Working with ModernMT I notice that while translating it will (to some extent) adapt to my use of terminology. So from segment n to segment n+1, the my terminology will in some cases be taken on board. "To some extent" and "in some cases" because this is how a neural network functions.

    (When I used the SMT LC with a termbase, I used it as a glorified TermInjector: Product listings where both source and target terms were in the nominative singular and not in a sentence structure, e.g. products containing product components with measurements, prices etc. I was quite happy with that.)

    So rather than to impose explicit rules ("teacher"->"Lehrer") on a system that works with implicit rules, why not focus on customization through bilingual training corpora, and maybe allow the user temporarily to change the "learning factor" for this purpose. (Enable normal learning through feedback, but heighten the learning factor either for designated projects or for segments containing specific "trigger" terms.)

    This would be a way of overcoming e.g. the gender problem, since NMTs are usually quite good at getting the gender right if there is a cue like a gender-specific pronoun.

    If there is a need for a glossary, why not build on the TermInjector model, where the user can define rules and replacement terms based on the TM (MT it would be in this case) output.

    For what it's worth.

    Daniel

Children
  • Hi Paul, hi Daniel,

    The main benefit of neural networks, as I remember, is that they work on the sub-symbolic level,

    State of the art is, acc. to my knowledge, a mix of sub-symbolic level (character level or byte-pair encoding) and word level  where needed (e. g. for anonymization, but also tackling the terminology challenges).

    Working with ModernMT I notice that while translating it will (to some extent) adapt to my use of terminology. So from segment n to segment n+1, the my terminology will in some cases be taken on board.

    I think this is a slightly different scenario as the MT learns from your translation, not only terminology, but also style etc, so this is customization of NMT in a more general term. And there are quite some strategies for customization, for example by producing synthetic sentences in order to make the NMT learn the correct solution.

    And I think we have been discussing "insert termbase terms in NMT suggestion", so about integrating and re-using validated terminology from a "multi-purpose" termbase ;-) And this is still one of the remaining challenges in NMT as many researchers admit.

    In my opinion this is a chicken-egg problem:

    From my times of converting termbases to RBMT dictionaries I remember that I have been preaching that gender and POS information should be added as mandatory fields to the termbase: It takes a terminologist only a little time to fill them, but they are vital for an MT dictionary. The same applies to proper definitions, context sentences, information about subject field and other rather ontological information which might help to disambiguate a term / word in context. Such explicit human-coded information is also or even more relevant - in times of data-driven approaches, like NMT.

    On the other hand, there is a lot of research around context-sensitive word embeddings for NMT (BERT etc.) which might be applied or combined with the "rich" termbase exports.
    Otherwise, you would need to hire lots of students to perform the termbase-MT dictionary conversion steps and manual additions

    Kind regards

    Christine