Pre-translation does not work

In a localisation project  we were given reference material which I converted into a TM. We were first given only missing sentences to translate. After translation I created a project TM.

When creating a project with the complete file to localise, only a very small percentage (~0,1 %) was pre-translated.

I had checked TM contents previously (they are organised in such a way, that normally we should get only Context matches), and TMs are activated.

What could be wrong?

  • Hi ,

    How did you create the TM? Could it be that there was a penalty being applied which reduced your matches to 99% and therefore missed the default 100% for pre-translation?

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Dear Paul,
    I found out the main reason: I did not see that I used the wrong file (target instead of source).
    However analysis is still not satisfying enough.
    We received resx files as source and reference target files (in which not translated strings were simply empty).
    I created a TM converting the source and target files into Xmls, then creating a bilingual Excel file which I opened in Studio with a new TM, that I updated (parameter : replace existing segments not activated)
    When creating the project with the complete resx, I added this TM and the project TM (only previously empty segments) to make sure everything would be in the right order.
  • Dear Beate,

    It's going to be tough to give you a precise answer without seeing your files but I can see at least one potential problem with your workflow. If there are any tags in the resx file then converting the way you do could result in the tags becoming translatable text by the time they are in the Excel file, or not there at all. This would obviously result in a loss of leverage when running the resx files through Studio against your TM.

    Why didn't you just align the resx files so the correct filetypes were used for your reference TM?

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Aligning turned out to be less effective and much more time-consuming than the conversion method (which I was able to check rather quickly).
    There are no tags at all since there is only plain text.
    What files would you need? How can I send them to you? (TMs and bilingual --> 14 MB)
  • No... just send me the following:

    1. source resx and target resx for a pair of files you know are not giving you the expected match
    2. your conversion for the same files to excel

    That's it. You can email them to pfilkin@sdl.com

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thanks for the file Beate... I can see why you won't use Alignment with files like these, very painful indeed!  So this is what I did.

    First I did use Alignment, and I ran an analysis on the file using the aligned TM and the alignment penalty set to zero.  I get this:

    I then did the same thing after using the Glossary Converter to convert your Excel file to TMX and then upgrading it.  I get this:

    Slightly different, but still not the result you wanted and the differences will be mostly accounted for by placeables.  There are also things that should be tags in these files but I did not prepare them at all so everything is translatable text and probably a like for like with your workflow.

    So, why are these not all 100%?  Well, the first thing is easy... duplicate translation penalties.  For example:

    This would account for so many 99% matches throughout the file.  Then you have things that are linked to segmentation like this:

    So you only have a 69% match against the excel based TM because Studio has segmented the text in the resx in a different way to your Excel sheet.  You can see this as #71 is separate to #72 and yet the first match in the TM includes the whole thing.  If I use the aligned TM I see this:

    You might not have exactly the same result as you used the bilingual Excel filtype to create your TM... although if I check it using the Bilingual Excel filetype I see this which is to be expected as it will use the cells for segmentation which is different to using the resx natively:

    Anyway... I hope I gave you enough food for thought and you can see why you are not going to get the Context Matches all the way through the file?

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thank you so much, Paul.

    However it will be difficult to explain this to a client who wants to make economies ;-)

    I kept duplicates, because I thought that if the source text is in the same order than in the memory, it would more easily result in context matches (and because there are different translations depending on context). How does Studio decide what's the best match in which circumstances?
  • Hi Beate, when you updated your bilingual Excel file into the TM which setting here did you use?

    I'm showing the defaults... but would be useful to know which you used.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • I kept duplicates, supposing that this way there would be more context matches, since the TM would be in the same order than the resx files
  • Hi again ,

    In the meantime I did this test again and this time followed exactly the route you are following to ensure we don't start discussing something irrelevant.  Previously I just wanted to explain where some of the differences may lie but can see I confused things by not having any Context Matches in my example.  The analysis results I have the way you are doing it are these:

    So I just need to look at reasons for the less than CM matches.  So... first of all, the correct translation for "Nature" is used.  Clearly I used a bad example before:

    If you want a thorough explanation of how CM matches work then perhaps review this article:

    https://multifarious.filkin.com/2013/02/13/100percent/

    Now, we still have a duplicate translation issue for quite a bit because Studio is still unable to determine context for many of the translations.  To see why you need to dig a little deeper.  Take the source word "Code" for example.  This appears in a lot of places... like this for example:

    So the question is why is this a 99% when "Nature" above is a CM?  Take a look at this one:

    Exactly the same context (previous source and target result) but a different segment.  It is therefore a duplicate.  In fact this one word occurs 25 times in the document you provided and only two of them are CM.

    The rest are almost certainly down to segmentation as you used the Excel file to update the TM as explained previously.  I checked a few out and this did seem to be the case.

    I hope this all makes a little  more sense for you know and I'm sorry I didn't do this exactly the same way you did in the first place!!  Interestingly I also looked using Passolo for the alignment with Daniel Brockmann.  This tool aligned the resx files very rapidly and because they are software strings and it's ID based was entirely accurate.  But the resultant TM actually provided worse leverage in Studio than any of the other three methods I've tried now.  This is probably down to segmentation differences again and so if you did this the benefit would be in translating the files in Passolo too.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub