Application to extract an analysis from a standalone SDLXLIFF

The scenario is this.  You are sent an sdlxliff to translate without a TM and it has already been pretranslated and payments terms stated on the basis of what's in the file.  How do you determine what this is in summary so you have the basis for questionning the payment if necessary?  I ask this because I had such a query this morning and couldn't solve it easily.

I could create a new TM from the sdlxliff and rerun the analysis from this, but the results are of course not correct because we don't have the basis for a proper comparison.

So I created a small spreadsheet on the basis of the report that comes from the SDLXLIFF Converter and then added a couple of columns to count the words and group the matches.  Clearly the wordcount will also be off and the analysis still doesn't match exactly, but it gave me an idea for another application.  So I'm adding this one too for comments.  A proper application that could use the proper analysis mechanisms would be quite cool I think.

I attached the spreadsheet too so you can see what I mean... it was actually useful for the user even with it's obvious flaws.

Regards

Paul

Paul Filkin | RWS Group

________________________
Design your own training!

You've done the courses and still need to go a little further, or still not clear? 
Tell us what you need in our Community Solutions Hub

Analysis Spreadsheet.xlsx
Parents
  • Hi Paul,

    Unknown said:
    A proper application that could use the proper analysis mechanisms would be quite cool I think.

    I thing for a proper analysis using the proper analysis mechanisms you need a TM and you can not fake a TM with fuzzy matches from the SDLXLIFF.

    On the other hand within the SDLXLIFF there is all the information needed to do an approximation i.e

    1. Count the words of the source text (removing any tags , numbers, variables etc.) from the source element

    <source>New unit.<x id="86" /></source>

    2. Add them to the appropriate group based on the percent attribute of the sdl:seg element

    <sdl:seg id="23" conf="Translated" origin="tm" percent="100">

    as long as they are not repetitions i.e. check if  the sdl:rep element exists

    <sdl:rep id="a0d5eecf-b918-4c72-b4f6-7aabd7fdcf73-20" />

    Do you thing that an application like that is really needed?

    Regards,
    Costas

  • Costas Nadalis said:

    Do you thing that an application like that is really needed?

    Hi Costas,

    You're picking all things I was thinking (but don't know what they're called ;-))  I don't think it would be an application that would see an enormous number of downloads, but I do think it's quite a useful tool to have access to... maybe we'd be surprised.  I quite like the ability to extract information like this from an sdlxliff and maybe there is more informaiton I haven't thought about that would be useful here too?

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • to be able to re-produce the statistical analysis ideally you would need to know the source of translation that was associated with segment that is present in the SDLXLIFF file.


    I fear that to dispute the % that was calculated and associated with pre-translated segment you would need to know the source of segment otherwise you are simply taking the % as they are presented and counting words & chars.


    The only solid information in this scenario would be if there were 100% (or greater) matches included with the pre-translation; then at least from that you could create a mock TM to base any sort of re-analysis.


    source is the key Stick out tongue sounds like something out of star wars

  • Hi Patrick,

    More good points.  But the dispute here is that the analysis available to the user for their payment is not based on a Studio analysis... at least it may not be.  So as a starting point if you were able to easily represent the values of the information in the sdlxliff you already have, because you will be ignoring CM and 100% matches "maybe", and only doing the rest, then you at least have the basis of a dispute (if you actually want to continue working for someone who works this way in the first place).

    Without this you can only say you have a feeling this is wrong and you would like to have more information, like a proper analysis report or the TM used to prepare the file etc.  If it was me I would do something like the speadsheet first so I could say "Look, I have an an analysis of the file you sent me and it doesn't match what you are prepared to pay.  So I think there is something amiss.  Can we review this please."  This would be better than having no ammunition at all other than a feeling it is wrong.

    Maybe it's just me?

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • yeah; good point; I guess this makes sense, if you can do some basic association on the information already present; if the numbers don't match up at all to the analysis, then it would be the basis of an inquiry as you suggested; sounds good Yes

    However,  whoever is being put in a situation like this is not being treated very well by the client in my opinion; but I guess that is not the topic for this thread

  • Unknown said:

    However,  whoever is being put in a situation like this is not being treated very well by the client in my opinion; but I guess that is not the topic for this thread

     
    I totally agree.  It seems many small agencies, maybe using Freelance rather than Professional, just send out sdlxliff files on their own after leveraging on their own TMs.  So if you work with people like this I guess it's a good sanity check at least.
     
    But like I said... probably (hopefully) not an app for major download numbers, but a nice to have in your armoury.
     
    Cheers
     
    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • So, in a nutshell you need a tool, that reads in an existing SDL-XLIFF file and simply counts all existing translation units like this:

    CM: 523 units

    100%: 89 Units

    95-99%: 23 Units

    85-94%: 12 Units

    75-84%: 346 Units

    50-74%: 612 Units

    No Match: 2436 Units.

    That should be easy.

Reply Children