Is it possible to use and train NMT with only your own TMs locally?

Hi,

I know and use Language Weaver, but is it possible to get some NMT engine trained with your own TMs which is NOT hosted in a cloud but which you can install and use locally or on your own servers and which does not share data with some remote servers/company?

Does anyone have experience with such services? What are the costs for this?

Best regards,

Pascal

emoji
Parents Reply
  • Thanks ,

    I hope to get contacted quite soon …

    I also got a private message (I wonder why this information was not posted here) to have a look at Opus-CAT MT which also seems to work with Trados Studio. On a first glance over the information on their homepage and without having to rely on a slow support request this seems to be what I’m looking for.

    There are a few more questions about possibly missing information/features to be discussed with devs but the first information (especially on how it's used and how it can be refined) on their page seems quite good and not so cryptic and scarce as on the LWE page Disappointed .

    Br,

    Pascal

    emoji
Children
  •  

    I also got a private message (I wonder why this information was not posted here) to have a look at Opus-CAT MT

    Indeed... that is a  great tool.  I almost mentioned it here but thought you wanted a Language Weaver solution for this.  If by "cloud" you mean anything not on your desktop then Opus may be the best solution for you anyway.  Language Weaver Edge doesn't use the Cloud, but it is still server based even if you had it on-premise for your own use.

    emoji
  •  ,

    I didn't ask for Language Weaver explicitly, I only said that I know and use LW but was looking for an NMT that can be trained with own MTs AND is not hosted in a cloud but on my own server or on local server. I guess you misread that part. ;)

    I have an own server on wich I can install whatever I want, so LWE might still be an option, but in order to decide, I would still need quite a lot of information from support or someone using it. I’m also open for any other tool that offers NMT and that can be customized and trained with the own TMS without using any other external online resources but my own online server.

    Regarding LWE it might be interesting in some other way (if like LW it can be matched with another language for languages with smaller TMs) by using customizable replace x by y rules (if that feature is even available).

    emoji
  • Hi  ,

    Language Weaver Edge is our  Machine Translation solution which can be deployed on premise. It supports all features currently available in the Cloud, including the ability to train language pairs with your own translation memories. Language Pairs that can be trained are called Adaptable Language Pairs: you can manually train them; i.e. control when the training takes place, upload your training data using *.tmx, define your own test data set for evaluation, and fully control deployment. This currently requires a GPU and Linux.

    In our next release (8.6 - to be released in October 22), we will support:

    • Adaptation on CPU (the GPU will not longer be a requirement - training will be longer, but will be possible). Training on CPU will be possible on both Windows and Linux.
    • Auto adaptive Language Pairs: with auto adaptive language pairs, the models are constantly trained with data available, including the uploaded translation memories, the dictionaries and also the user feedback provided (collected through the LW Edge UI). The training happens in the background and there is no need to perform manual operations such as deployment of trained models. In Studio, you only reference the auto-adaptive model once in the Language Pair Mapping and you are good to go.

    Feel free to join the Language Weaver Edge section of the community  Language Weaver Edge to get more information. We have a few videos highlighting the User Feedback process - and there is also a recording of a session from Connect 2019 (a bit old but still relevant) on how to best adapt NMT Models using our on-premise solution.

    I hope this is helpful and please let me know if you need any more information! 

    emoji
  • Hi ,

    do you know if Opus-CAT MT can be installed on a server or does it need to be installed on the same PC where Trados is installed? I’m using Trados from various locations (Professional network license) and would need to be able to connect to the same dataset from everywhere, so installing it on my own server would be my first choice.

    Br,

    Pascal

    emoji
  •  

    do you know if Opus-CAT MT can be installed on a server

    I don't know.  I run it locally on my own laptop.  The best person to answer this question would be  .

    emoji
  • Hi Pascal,

    I'm the developer of OPUS-CAT, I can fill in some details. First of all, OPUS-CAT can be used to fine-tune base OPUS models (which are available for most language pairs) with your own TMs, this corresponds to LW's Adaption on CPU option that Arnaud mentions in his reply. It's also possible to train models from scratch, but that would require a GPU and also setting up the OPUS-MT training pipeline.

    You can set up OPUS-CAT on a server (although just copying the fine-tuned model to any computer you Trados on is simple), you just need to configure it to allow incoming connections (by changing a setting), and of course you'll need to make the required exceptions in your firewall etc. However, I have only tested it to work over the local network. Once OPUS-CAT is running on the server, you can access it from your local Trados by changing the IP address in the Trados plugin settings.

    In any case, I suggest you give the fine-tuning functionality in OPUS-CAT a try with your data (the setup is very simple, and you can ask for support here or at github.com/.../issues), and see how the output looks. At the very least, it will give you a point of comparison when evaluating similar paid services.

    -Tommi

    emoji
  • Hi ,

    Ok, I’ll have to check that.

    One of the target languages I would like to use it for is very rare (Luxembourgish) so I guess I’ll have to train it by myself. Even if it got already some training model I fear that it relies on old grammar and spelling from before the language reform and on the only available dictionary, that does not differentiate between slang/colloquial and official speech (as seen in many other MT plugins on the market (Google, MemSource …) thus it would not be of much use at all as it would require too much rewriting and replacing. Most available Luxembourgish solutions are crap as they only seem to do word by word translation use too much of the French vocabulary instead of the real Luxembourgish terms and mix German terms in if a word is unknown. Disappointed

    Training on the server could be problem as it does not have GPU installed, but I’m checking with my hoster if it can be added or if migration to another server would be needed. But I guess in worst case I could do training on my PC and then upload everything to the server.

    Luxembourgish grammar and terminology is quite similar to German so I even have some idea of getting better results but I don’t think any NMT does handle that feature right now.

    I’ll need some more information about terminology and glossary management though. I didn’t find much about this on Opus-CAT MT page yet.

    BTW: what are the minimum requirements for the GPU?

    Pascal

    emoji
  • Hi  ,

    just a quick question regarding the Auto Adaptive Language Pairs in release 8.6:

    Will those who use it in Trados Studio via plugin also be able to provide feedback for the automatic retraining of the models?

    Thanks, 

    Natalie

    emoji
  • Hi,

    There are OPUS model for eng-ltz and ltz-eng in the OPUS-CAT model repository (they can be installed via the OPUS-CAT UI), but they seem to be mostly trained with crawled data, so the quality is probably fairly bad. I doubt anyone else will have much more ltz data either, though, so unless you have a very big corpus yourself (at least a million segments), good quality MT is probably not going to be possible.

    If German is quite similar, you might get best results by fine-tuning a German model using the Luxembourgish data you have. It might also help to translate monolingual Luxembourgish data with a German model and the use that synthetic data to supplement your training data (using the correct Luxembourgish as target text and the potentially faulty MT as source text, this method is called backtranslation).

    I'm currently working on adding terminology management to OPUS-CAT, although there is already a possibility to use edit rules to perform string replacements with the machine translation output. Termbase support should be released this year.

    As for the minimum requirements for GPU, since there's not enough data there probably isn't a point to the GPU, but you would need a relatively high-end card from NVIDIA (I can't recommend a specific one offhand, since I train models on computing clusters instead of locally).

    -Tommi

    emoji
  • Hi Tommi,

    if they are crawled, then please dump them they would unfortunately be more harm than help. You can count up to 10 errors or more per sentence of 15 words. Disappointed And that’s still a nicely formulated estimation. I’ve seen professional translators with even more errors in their translations and they didn’t use MT at that time and I QAed most of the available Luxembourgish translators so far. The worst I’ve seen was 15 errors in 10 words by a so called professional translator and most crawled texts are not even written by linguists. So far from around 40 available LB translators only 3 or 4 other translators offer quite good quality but even they are still over the international standard of 3 errors per 1k words. The worst I’ve seen on the net so far is people not even knowing how to correctly spell their own language and with this I don’t mean with one or two typos in the word but with a really awkward spelling that is not even close to the correct word.

    I’ve got a "pretty big" TM for a Luxembourgish translator but as I translate from 6 source languages and I also translate into my other mother tongues German and French, my biggest TM for Luxembourgish after 18 years as a translator is still not bigger than 180k for ENUS (the next one is at 150k for ENUK) yet but I guess it’s one of the biggest available with correct grammar, spelling and terminology right now as I’m one of the leading official linguists who also worked on the language reform and I’m programming an official professional grammar and spell checker for Luxembourgish with the help of some of my official linguist colleagues.

    Well similar does not mean like EN_US vs EN_UK, the spelling is quite different but a lot of Luxembourgish terminology is based on German one and about 80% of German grammar is used for Luxembourgish but style and syntax are at 90% similarity. So one could work with some kind of a word/term replacement model to get quite decent results.

    OK, so with high-end cards you mean around the top 5 RTX range cards that cost 1,000 EUR upwards to 2k+?

    Pascal

    emoji