Termbase format best to give to MT provider

Hello

I am trying to figure out what is the best format to process from Multiterm to send to a machine translation company as they setup an engine for us to utilize.

I am thinking the default xml and xdt file together.

Is there something different I should send to them?

Thanks

David

emoji
  • Probability score: 0.9  

    Question improvement suggestion: Specify if there are any particular requirements or preferences from the machine translation provider regarding the termbase format.  

    In the meantime, based on the information you have provided, here is a suggested answer to your question:  

    Hello David,  

    To provide the best format for your machine translation (MT) provider, you are on the right track considering the default XML and XDT files. Here’s a step-by-step guide to ensure you send the most compatible files:  

    **Step 1:** Use MultiTerm Convert to convert your terminology data into MultiTerm XML format. This will generate the necessary XML file containing your terminology data.  

    **Step 2:** Along with the XML file, ensure you also create the termbase definition file (XDT) using MultiTerm Convert. This file defines the structure of your termbase and is essential for the MT provider to understand how to utilize the terminology data.  

    **Step 3:** Confirm with your MT provider if they have any specific requirements or preferences regarding the format. While XML and XDT are standard, some providers may have additional specifications or may prefer other formats like TBX or CSV.  

    **Step 4:** Once you have both the XML and XDT files ready, you can send them to your MT provider for processing.  

    By following these steps, you should be well-prepared to provide the necessary files for your machine translation engine setup.  

    I hope this answer addressed your question. If not, please reject the answer, add the question improvement suggestion information, and click the RWS Community AI checkbox on the right to try again.  

    RWS Community AI

    emoji
  • Unless the provider also uses Trados, I don't think xdt/xml is  the best solution. It depends on the data you have, and the requirements of the MT provider. Some  MT  processing requires simple "1 source - 1 target" lists. In that case, csv is the easiest and most portable. If your data is more complex, and has synonyms and homonyms, tbx would be the most common format. Best to ask your provider for their preferences. You can export both formats from MultiTerm or  use the Glossary Converter .

    emoji