Question regarding the import of tmx files into Studio TMs

Hello - we have several thousand tmx files to import, but the Studio TM doesn't display the tmx file name, whether it be the actual name of the file, or a property within the tmx file. Is there a way for this info to be kept and displayed? Many thanks!

  • Just to specify, as can be inferred from the number of files, we're particularly looking for a way to batch-import!
    Lucy-Jane

  • Question regarding the import of tmx files into Studio TMs

     

    Yes.  You need to create a field on your TM to write in the name of the TMX and then when you import use the filename.

    However, if you want this to happen automatically because you have thousands of files then I'm afraid there isn't a way out of the box the do this.  It could be done through the API... probably not too difficult.  We might have a look at this through the appstore if nobody else does.

    A possible workaround for now, I guess, would be to edit the "Created by" values in the TMX files with a batch process (this would be outside of Studio) and then the system fields could be updated with the value you use (the filename).

    Another possible workaround woud be use Powershell... might have covered something like this with his powershell tools here:

    https://github.com/EvzenP/STraSAK/

    Unless of course someone has a better idea?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    0 Former Member

    First, open any two tmxs with any text editor and take a look at them carefully with time
    you will know how to handle it properly

    Then add up all (several thousands ?) of them to make a single tmx

    Finally, import it.

    too easy...
    Regards 

  • Then add up all (several thousands ?) of them to make a single tmx

    Finally, import it.

    too easy...

    Except it doesn't solve the problem at all.  They want to stamp each TU that is imported with the filename as a TM field.

    Importing multiple TMs isn't a problem and there is absolutely no need to merge them in the first place.  This is about identifying where the material was imported from in the final TM.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • I'm afraid that STraSAK doesn't have a feature which could be used for this task.

    But the whole task sounds like an XY problem...
    It seems to me that the real starting point for the task is somewhere before the thousands of TMX files... simply because I can't think of a "sensible source" of thousands of TMX files...
    Like, why and from which source would someone create so many TMXs, that doesn't make much sense to me... it sounds like if someone thought that the only way to create bilingual content from thousands of bilingual files is to export each file to a separate TMX... or something like that.
    Which would mean that the ultimate goal could be probably achieved differently, not necessarily the worst possible way (via the separate TMXs).

    So, , if you don't mind, can you provide a bit more context?

  • It looks like it could be solved programmatically using EditScript functionality during import to TM, but EditScript is NOT PROPERLY DOCUMENTED ANYWHERE :( :( :(

    There is a mention about it in the Studio 2015 (!!!) TM API documentation (http://producthelp.sdl.com/SDK/TranslationMemoryApi/4.0/html/024ab948-758e-4f14-a7c5-e7e8a058b433.htm - check the bottom right of the API schema) and one can find the EditScript class itself documented, but there is not a single word anywhere about how to use it - what the script actually is, how it works, etc.

    What's more, this part is completely missing from newer API version documentation, there is a LOT missing in the newer version documentation!

    Is SDL ever going to fix this?
    Paul, the other day you were wondering why there is so few developers doing something with Studio API for the community... perhaps this is one of the reasons - with such a poor support from SDL, why bother...

    Webpage screenshot of SDL Translation Memory API documentation from 2015, highlighting the EditScript class in the API schema.Webpage screenshot of SDL Translation Memory API documentation from 2017 or later, showing missing EditScript class and other components.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 6:25 AM (GMT 0) on 29 Feb 2024]
  • First of all, my apologies for not responding sooner, and my thanks for all your replies and suggested avenues for exploration. 

    A little context: 

    At the OECD, we've been using Multitrans for the last decade, complemented by Deja Vu, which arrived about 5 years ago. The only way to share resources between the two tools is by .tmx. Our Multitrans corpora contain many thousands of file pairs, and so we've been aligning these using Align Factory for use with Deja Vu, or for our external translators to use with whatever CAT tool they have - which is how we have ended up here!

    We're switching our two tools for Studio in the near future, and of course need to transfer all our current resources. Some of the TMs are going to require importing several hundred tmx files, which is why we were hoping to be able to batch-import, preferably using the filename itself, to avoid a) modifying the metadata of each .tmx file, or b) entering the metadata for each .tmx individually at the moment of import. 

    I hope this sheds some light, although it is starting to look like we're no going to be able avoid using the metadata fields, and modifying the metadata for each one....any other ideas most gratefully received!

    Many thanks again,

    Lucy-Jane

  • Question is, WHY do you need/want to have each translation unit marked with the filename it comes from... because such thing does NOT happen during the standard translation workflow anyway.
    When importing bilingual content at the end of translation workflow into some mater TM you can mark the imported translation units by updating a user field in the TM, but this happens for the entire imported batch, not for individual files... unless the size of the batch is a single file, of course.

    So, are you saying that you did not build any TM during the past decade, so the only thing you have now are the TMXs for individual aligned file pairs? That sounds weird...

    In any case, modifying the metadata for each individual TMX can be of course done programmatically, so you don't need to do it manually.
    But you need someone with some programming/scripting skills to create such tool for you... Or you can check if some of the tools from OKAPI Framework has such capability.

    I could add such functionality to STraSAK, but as mentioned above, there is a fundamental SDL documentation missing, so until SDL fills the missing pieces of information, no progress in this are.

  • Weird as it may sound, that's how it is! Multitrans is basically an indexing tool, and so is document-based, not segment-based; it compiles static document repositories (corpora), rather than TMs. It is possible to export Multitrans content to tmx format, but unfortunately has too many bugs for us to use, and the required metadata isn't exported. The nature of translation at the OECD means it is vital for translators to know exactly where each segment is coming from. 

    Yes, writing a script for the modification of the metadata is pretty much the conclusion we've reached as well...we were very surprised that the file name isn't retained by Studio, given that it is retained when importing to DVX!  

  • I don't know what your timescales are, but we are going to start work on a small import plugin for Studio shortly.  This will support the import of SDLXLIFF files first, but we will then add TMX support.  The idea being it will optionally add the following metadata to fields in the TM:

    - TU number from the SDLXLIFF

    - filename

    This may help you if you are unable to get a sensible approach anywhere else in the meantime.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub