Application to extract an analysis from a standalone SDLXLIFF

The scenario is this.  You are sent an sdlxliff to translate without a TM and it has already been pretranslated and payments terms stated on the basis of what's in the file.  How do you determine what this is in summary so you have the basis for questionning the payment if necessary?  I ask this because I had such a query this morning and couldn't solve it easily.

I could create a new TM from the sdlxliff and rerun the analysis from this, but the results are of course not correct because we don't have the basis for a proper comparison.

So I created a small spreadsheet on the basis of the report that comes from the SDLXLIFF Converter and then added a couple of columns to count the words and group the matches.  Clearly the wordcount will also be off and the analysis still doesn't match exactly, but it gave me an idea for another application.  So I'm adding this one too for comments.  A proper application that could use the proper analysis mechanisms would be quite cool I think.

I attached the spreadsheet too so you can see what I mean... it was actually useful for the user even with it's obvious flaws.

Regards

Paul

Paul Filkin | RWS Group

________________________
Design your own training!

You've done the courses and still need to go a little further, or still not clear? 
Tell us what you need in our Community Solutions Hub

Analysis Spreadsheet.xlsx
Parents
  • Hi Paul,

    Unknown said:
    A proper application that could use the proper analysis mechanisms would be quite cool I think.

    I thing for a proper analysis using the proper analysis mechanisms you need a TM and you can not fake a TM with fuzzy matches from the SDLXLIFF.

    On the other hand within the SDLXLIFF there is all the information needed to do an approximation i.e

    1. Count the words of the source text (removing any tags , numbers, variables etc.) from the source element

    <source>New unit.<x id="86" /></source>

    2. Add them to the appropriate group based on the percent attribute of the sdl:seg element

    <sdl:seg id="23" conf="Translated" origin="tm" percent="100">

    as long as they are not repetitions i.e. check if  the sdl:rep element exists

    <sdl:rep id="a0d5eecf-b918-4c72-b4f6-7aabd7fdcf73-20" />

    Do you thing that an application like that is really needed?

    Regards,
    Costas

  • Costas Nadalis said:

    Do you thing that an application like that is really needed?

    Hi Costas,

    You're picking all things I was thinking (but don't know what they're called ;-))  I don't think it would be an application that would see an enormous number of downloads, but I do think it's quite a useful tool to have access to... maybe we'd be surprised.  I quite like the ability to extract information like this from an sdlxliff and maybe there is more informaiton I haven't thought about that would be useful here too?

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • to be able to re-produce the statistical analysis ideally you would need to know the source of translation that was associated with segment that is present in the SDLXLIFF file.


    I fear that to dispute the % that was calculated and associated with pre-translated segment you would need to know the source of segment otherwise you are simply taking the % as they are presented and counting words & chars.


    The only solid information in this scenario would be if there were 100% (or greater) matches included with the pre-translation; then at least from that you could create a mock TM to base any sort of re-analysis.


    source is the key Stick out tongue sounds like something out of star wars

Reply
  • to be able to re-produce the statistical analysis ideally you would need to know the source of translation that was associated with segment that is present in the SDLXLIFF file.


    I fear that to dispute the % that was calculated and associated with pre-translated segment you would need to know the source of segment otherwise you are simply taking the % as they are presented and counting words & chars.


    The only solid information in this scenario would be if there were 100% (or greater) matches included with the pre-translation; then at least from that you could create a mock TM to base any sort of re-analysis.


    source is the key Stick out tongue sounds like something out of star wars

Children