Baffling word counts

I often get minus figures in my word counts, why would that be? My document last week had minus -160 in the fuzzy match figure event though the doc was complete.

And now my latest document is 17000 words of which I know I've translated about 7000 (40%)

Yet Studio says I've translated 52910 words (not characters) or 57% ( the character count is 370,000)

Totally baffled by this. Of all the many bugs in the software, this one irritates me most.

Anyone cast any light?

Windows 7, Studio 2011 SP2.

David

Parents Reply Children
  • Hi David,

    This is deliberate because this feature, which is a strongpoint of concordance in Studio or Trados, allows you to find potential useful phrases even when they are mispelled.  If you are searching for single words like this then of course, assuming they are spelt correctly the result won't be much help.  But the scoring does help to discern things as they are presented in scoring order.

    The way it works is that Studio "normalizes" a string character-by-character, which basically means that characters which have diacritical marks (accents, tremas, carons, and other "decorative elements") are mapped to the base character.  For example, your Swedish "a umlaut" is mapped to plain "a".  This is done so that even if you search for "a umlaut", you still get hits on good search results without the diacritical mark, or the other way around.  The purpose is to increase recall (leverage) and is something many users expect.

    The other point is that concordance often produces better, or more useful results when you search for phrases as this may put more context around what you are looking for.  Certainly concordance is more than a simple word search and it is designed to be this way.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • I'm sorry Paul in the case of Swedish, it simply does not apply. The Swedish ä is a different letter letter altogether- a is not the base character just because it looks similar. If in English you were to search for 'hit' and get 'hat' or 'hot', how on earth does that help you in any context whatsoever? It is not a strongpoint, it is a failing, and extremely irritating.

    Regards

    David

  • Hi David,

    I'm stumped then.  I would have thought that had you searched for 'hit' and you had none in your TM then the fuzzy value would tell you immediately there were none.  So if you then didn't even want to see the fuzzies why not increase the percentage value used for concordance search and then you won't get them?

    I understand they are of no value in this case, but even in your screenshot they are not presented as a 100% match.

    What am I missing here?

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi again,

    On reflection, perhaps you are thinking the match value refers to the entire segment rather than the highlighted word itself?  This is not the case because concordance is not doing the same thing as TM lookup.

    Apologies if you're clear on that one... I was just wondering so thought I'd check just in case.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub