Import MateCat tmx file into translation memory

Hi

I have an tmx file which has downloaded from MateCat Software. when I import this tmx file with SDL Trados Studio 2021 SR1 - 16.1.3.4096 into an sdltm file, the tag in the resulting sdltm shows like:

<g id=. . > . .  .</g>

The imported result:

when I open this tmx with a text editor it shows like this:

<tu tuid="1758612440" creationdate="20210130T140643Z" datatype="plaintext" srclang="fa-IR">
    <prop type="x-MateCAT-id_job">3581968</prop>
    <prop type="x-MateCAT-id_segment">1758612440</prop>
    <prop type="x-MateCAT-filename">tmp82D8.docx</prop>
    <prop type="x-MateCAT-status">TRANSLATED</prop>


    <tuv xml:lang="fa-IR">
        <seg>&lt;g id="1"&gt;آقای &lt;/g&gt;مجید احمدزاده&lt;g id="2"&gt; این رنگین کمان بزرگ را در رودخانه هراز صید کردند.&lt;/g&gt;</seg>
    </tuv>
    <tuv xml:lang="en-US">
        <seg>&lt;g id="1"&gt;Mr. Ahmadzadeh&lt;/g&gt;&lt;g id="2"&gt;has catched this big rainbow in Haraz river&lt;/g&gt;</seg>
    </tuv>
</tu>

As far as I investigate, this happens because Trados use different tag name and type for the same context. Is there a way to convert Matecat tags into equivalent SDLTrados tag?



Generated Image Alt-Text
[edited by: Trados AI at 2:51 PM (GMT 0) on 1 Mar 2024]
emoji
Parents Reply Children
  • I believe the problem is because Matecat is using unknown syntax as chars for inline tags in TMX.  "g" is unknown so it's treated as text.  The only allowable inline elements in TMX are these:

    <bpt><ept><it><ph>, and <hi>

    It may be because Matecat also seems to be using version 1.4 as opposed to 1.4b... although I don't really know if "g" was allowed in 1.4 either.  Certainly TMX 1.4b is pretty old (2005 I think) but 1.4 was probably deprecated around 2000...ish (guessing!).

    So it's not an SDL tag, the problem is the TMX does not conform to the standard for TMX.

    You have a couple of ways to tackle this.  You could search & replace the g tags with bpt and ept, or even ph tags in a decent text editor.  Or... a bit controversial... but if you have a copy of memoQ you could use this option:

    Trados Studio Translation memory TMX import settings window with an arrow pointing to 'Custom tags' option selected.

    This will create memoQ placeholder tags and now you can export from memoQ as a TMX and it'll have tags that will be recognised:

    A waveform display indicating audio or video file with no visible errors or warnings.

    In Studio importing this new file will look like this:

    Trados Studio error message window showing multiple 'g' tag errors in the imported TMX file.

    Quite a nice feature... I wonder how often this is required by users?  Certainly seems a useful approach if you work with MateCat a lot.  Might steal it as an app, or as an option to one of the existing apps.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 12:54 AM (GMT 0) on 29 Feb 2024]