Creating a TM from a Single Bilingual File

Hi all! I'm new here. So stupid question time!

I need to translate a massive file. But I'm in luck! It's version 2.0, and I have the following: a txt file for the 1.5 ST, a txt file for the 1.5 TT, and an bilingual HTML file that has the entire thing with the English on top and Japanese right underneath.

I tried to align the TXT files, but it's... just too big. It's around 20,000 segments, things go off in the middle, several parts aren't parsing right... it would take me two weeks to align it manually.

But wait, I already have an "aligned" version! The bilingual HTML file! It's already separated line by line. Is there any way to leverage this into a TM? I found a question like this in the archives, but it was around five years old and the answer seemed to be a big "nope". I've got my fingers crossed that there's a workaround here... otherwise I will just end up having to CTRL+F every single line of this huge thing from scratch.

emoji
Parents Reply Children
  • Whoa! This is... exactly what I want to be able to do.

    It looks like magic to me though. I don't know what a "regular expression" is, or how I'd go about extracting them, or how to import whatever I extracted into a TM like that ^^;

    Time to plumb the depths of YouTube?

    emoji
  • So I've been researching this for about two hours now, and my brain is totally fried.

    Is there any chance you can slowly walk me through what you did here? It looks absolutely glorious.Upside down

    emoji
  • This is the best online source to learn Regex I know: https://www.regular-expressions.info/characters.html

    It's a super useful skill to have because tools like Trados Studio allow you to use it, so does OpenOffice, advanced text editors like Npp or EditPad Pro, basically all modern programming languages, special tools like PowerGrep etc. The initial learning curve is steep.

    If you are new to regex, then EKs instructions are a bit like "how to fly to the moon"

    a) build spaceship

    b) fly to moon

    Yeah, sure.

    I attached my tmx file in case anybody wants to play around with it.

    2543.alignment.zip

    Daniel

    emoji
  • the only one keyword is "tmx"

    and it reveals itself in here.

    you do not needed anything else (in a nutshell).

    Trados Studio interface showing Translation Memories section with 'From the scratch en-US->ko-KR' selected. Import window open with 'Add Files...' button highlighted and file explorer showing 'This is an Exported file from Trados.tmx' selected.

    the only one final source of "tmx" is here

    TMX 1.4b Specification
    https://www.gala-global.org/tmx-14b

    and the one final source of "Regular expression" for Trados area is here

    https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference

    so, my last scrceenshot is to combine all above them within a SingMouseClick

    - you do not have to learn anything at all, you know how to click already, I believe.

    Notepad++ open with XML content of 'This is an Exported file from Trados.tmx' displayed. Trados Studio in the background with 'New New en-US->ko-KR' selected in Translation Memories section and 'Introduction' displayed in the main window.

    I am not serving Free lunch any more.

    Good Luck To You

    .

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 6:13 AM (GMT 0) on 29 Feb 2024]
  • I am attaching "sample" SDL TM and its exported tmx, that is the "tmx" for you.

    6557.sample.zip

    emoji
  • Please note that you are not missing anything and it's not surprising you'll be confused with the "instructions" from Kelly.  He doesn't do instructions, he just likes to show off how he was able to do something but doesn't explain anything.

    He essentially did this:

    1. extracted the source and target text for each language using regular expressions
    2. put this extracted text into a delimited file and converted that into a TMX file (or some process like this)
    3. from there you can either import the TMX into a new Trados translation memory, or convert it directly (the Glossary Converter on the appstore can do this for you... but importing is easy enough with out of the box features)
    4. He also shows a UI for Studio full of all his little tools as he does know how to develop a little, so it won't look familiar to you and he hasn't explained any of that

    So your task would be to learn the following:

    • how to use regular expressions to identify the translatable text from your file
    • how to use that expression in a tool (like Notepad++) so it's copied to a new file in a structured way
      • I said "new file" because it doesn't have to be a TMX
      • You could simply separate the results of your regular expressions by a tab and then use the Glossary Converter to convert the tab delimited file directly to an SDLTM (translation memory)

    Kelly has essentially forgotten that everyone has to start somewhere and I can still remember the sort of questions we used to get from him... and if other users had been like Kelly today he would have learned nothing!  This is nothing to do with a free lunch as he hasn't even given you a biscuit!  I think it's fine to offer to do work for someone at a price as everyone's time is valuable.  So if you do have technical skills that go above those of the average user then contact them off forum through the chat and agree a price for the work.  But don't keep posting like this.  It's rude, grandiose and frankly unhelpful.

    We do have a forum here for regular expressions:

    https://community.rws.com/product-groups/trados-portfolio/trados-studio/f/regex_and_xpath

    You can feel safe having a go, sharing examples of what you are trying to achieve and getting some help along the way.  The example reference from Daniel is a good one and there are also some good tools out there that help to use regular expressions with explanation... some free and some at a cost.  So use the forum and ask as you go.  Once you get your head around them you'll never go back... they are very useful!

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • point is quite clear.

    Trados should recognize simple "xlsx" or "txt" file format when it makes TM from the scratch.

    Currently, it insists on "tmx" format only which "almost" NOBODY knows it.

    And, I have made quite clear and easy way to go there.

    emoji
  • Trados should recognize simple "xlsx" or "txt" file format when it makes TM from the scratch.

    Excel or TXT could be used, but then it would need to be very clear how the content should be structured.  Both files could contain absolutely anything and still be an Excel or TXT file.  Also how would you represent other properties in a TM?  More structure, and a lot more complex than a TMX.

    I think for extremely simply just source and target text it is a possibility but given how often this is done, and how easy it is to use existing tools to convert them adding this into a tool that already has a lot of options would be a waste of valuable development resource.

    Currently, it insists on "tmx" format only which "almost" NOBODY knows it.

    You must be kidding!  TMX is an industry recognised standard for exchanging TM data between tools.  That's the only sensible format to use.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • This is all awesome! I have a lot more to go on now, thanks so much. I do apologize for taking up your time with my ignorance... hopefully after I get some of these skills under my belt I can pay it forward in the future.

    I am immensely grateful for all the support you all have provided. I have a long way to go, but I can't express strongly enough how cool it is to see that the exact thing I wanted to do is actually doable! You rock.

    Now to spend the rest of tomorrow going through all the new tech. It's going to be a good day!

    Cheers all around.

    emoji
  • Hi Benjamin,

    Unfortunately, I'm swamped and don't have much time to try and help you create the regular expressions you need, as that can be quite daunting when you're new to regex, but I have a series of articles that may help you learn regex a little bit faster, so I thought I'd share in case that helps. Here's one:

    https://noradiaz.blogspot.com/2019/10/regular-expressions-for-translators.html

    You can find the rest searching for "regular expressions" in the blog.

    Good luck!

    emoji