What is the best way to convert a memoQ TM to SDL?

Hi Folks,

I am new to SDL Studio. Have done all my work in memoQ so far. Now need to convert my memoQ TMs to .sdltm and am looking for the best way of doing it. Exporting from memoQ to TMX and then importing/upgrading TMX in Studio 2017 results in huge losses/multiple errors (20-40% of the units seem to be lost in the process).

If there a work-around? E.g. some way of trimming a large TMX file to a simple two-column BL table outside CAT tools (I don't care about formatting or any metadata in my TMs, just need bare units) and then importing this table into .sdltm without any loss?

Thanks in advance for any suggestions or pointers to the relevant resources.

Yuri

 

  • Hi ,

    I doubt they are losses. Studio doesn't need to keep duplicates while memoQ does. Duplicates could be this for example:

    1,234.56
    97,345.21
    5.23
    80.001

    If you imported these into a Studio TM you would only get one TU. If you translated a document with these four segments in it all four would be 100% matches even though there was only one TU in there.

    There will be other things like this, but it would be worth you seeing what sort of content is in your TMX as it's very likely it's just bloated with unnecessary data.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Paul,

    Thank you for noting it. I will look into it. But SDL did generate an error log in importing/upgrading my memoQ TMX, told me that there were about 6000 errors (out of about 160K units), and produced a large error TMX file ("... en-US_ru-RU_error.tmx"). I suppose this means there were real errors in conversion/upgrading.

    I am playing with Oliphant at the moment, to see if I can downgrade my TMX to a two-column unicode text file. If I succeed is there a sure good way of importing the resulting table into the SDL memory without any losses or errors? This may be an easy question, so perhaps you could just point me to the relevant resource for an answer?

    Thanks again for your prompt reply.

    Yuri
  • Unknown said:
    SDL did generate an error log in importing/upgrading my memoQ TMX, told me that there were about 6000 errors (out of about 160K units), and produced a large error TMX file ("... en-US_ru-RU_error.tmx")

    Did you review the log?  What sort of problems were mentioned?  The reason I ask is because even the things I mentioned above would cause an error log like this.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Paul and others, 

    Yes, and all the error reports in ‘_error.tmx’ looks like this: 

    <!--Error: TagAnchorNotOpen-->

        <tu creationdate="20110508T110453Z" creationid="Balashov" changedate="20110509T153156Z" changeid="Balashov">

          <prop type="x-OriginalFormat">Unknown</prop>

          <tuv xml:lang="en-US">

            <seg><bpt i="1" type="1" x="1" /><bpt i="2" type="2" x="2" />What is the proposed optional "translational research"?<ept i="1" /><ept i="2" /></seg>

          </tuv>

          <tuv xml:lang="ru-RU">

            <seg><bpt i="1" type="1" x="1" /><bpt i="2" type="2" x="2" />В чём заключаются предполагаемые необязательные «сопутствующие исследования» ("translational research")?<ept i="1" /><ept i="2" /></seg>

          </tuv>

        </tu>

    I looked back at the original memoQ TM and I think I have an idea of what may be creating these problems: the double formatting in these memoQ TM entries, such as bold+italic and bold+underline (please see the screenshot).

    Does this sound right?

    If so could you suggest a good way of getting rid of ALL the formatting tags in a memoQ-generated TMX? Deleting the external tags turned out to be easy: just a single click in the memoQ TM viewer. But the internal tags for bold/italic/underline are invisible there. I’m still playing with Olifant; and my learning curve there is slow. But perhaps there is another way, e.g. doing it manually in Notepad or Notepad++?

    Thanks again!

    Yuri

     

  • Hi Yuri,

    I had exactly same problem when importing several tmx generated by Transit in an .sdltm. For evey TM I converted, I kept the non-valid segments (which you obtain in General import options -> Export invalid translation units), and edited it in Notepad++ with a macro I created myself. This macro simply looks for the elements <bpt...> <ept...> that Studio doesn't like and delete them. Then you save the file and re-import it in Studio TM.

    As an example (depending on the tags you have to delete): the macro looks for &lt;ept(.*?)/&gt; and replace to nothing "" - regular expression activated.

    Hope this helps,

    Almudena
  • This seems that the original source file loaded to MemoQ was actually invalid - the tags are overlapping (bpt1 opened, then bpt2 opened, then bpt1 closed and then bpt2 closed... which is clearly incorrect).

    So Studio is correctly rejecting such mess...

    Since such tagging is completely irrelevant (these would be normally - if placed correctly, i.e. not overlapping - taken outside of the segment), you can simply delete them from the segments and then import the fixed TMX to Studio TM.