Why does acronym recognition in the TM settings impact the word count of hyperlinks?

Hi,

Acronym recognition has been discussed in this forum a few times of the years, and most recently, this feature was changed for Studio 2019, I believe.

Having upgraded from Studio 2017 SR2 to 2021 SR2 recently, I only now noticed this change in which words are being recognised as acronyms (e.g. segments containing only a single upper-case word are no longer regarded as acronyms). An improvement, but because segments like 'The view is set to SMALL.' and 'The view is set to LARGE.' still result in Repetitions in the analysis and in filters (Exclude first occurrences), we will leave acronym recognition deactivated in our TMs.

However, I've now noticed that this also has an effect on hyperlinks. A Word document containing only the hyperlink www.rws.com/product-groups/trados-studio contains 4 words with acronym recognition on and 7 words with acronym recognition off. 

Is there a specific reason for this behaviour? And would it not make more sense to make these into 2 separate recognizers (i.e. Acronyms and Hyperlinks)? This would allow us to turn off acronym recognition without this resulting in larger word counts for files containing hyperlinks.

Best,
Lieven

emoji
  • Excellent observations and questions... and yes, there is a specific reason for this behaviour.  It stems from decisions made some time ago before we had all the options we have today to support different data types in the translation memory.  These different data types are enabled through check boxes in the user interface and in 2011 a late change was introduced to support URL and IP address recognizers.  This was great as our users were asking for this, but in realising this benefit quickly for the release they were both tied to the use of the RecogniseAcronyms flag rather than introduce more change in the user interface and delay a release.  The developer made a comment in the code at the time that we should introduce more control as these are not really acronyms and we should introduce the ability to manage these recognisers separately.  So have checkboxes for these.

    The effort to do this is a little more complicated than I hopefully made that sound because we would also need to make sure that existing data (indexing, Context Match hashes etc.) doesn't get broken, and that we create a suitable behaviour when creating new TMs from a user interface that doesn't use those flag values.  We do have an item of work planned to do this (LCC-6584) but like many things once we get into planning mode with a huge list of things to tackle, this one never surfaced, probably because nobody noticed the lack of control, most likely because the effect on overall leverage for most users was negligible.

    So it's really interesting that you picked this one up, and I hope I explained that well enough for you.

    This exercise has put this back on our radar but I can't confirm that means we will definitely do it for the next release.  It still needs to be handled carefully, like all changes relating to TM leverage, and this means we have to weigh up priorities with other plans.

    Perhaps now that we are discussing this openly you could raise an idea in the ideas site, and see whether others also view this as something important for us to address?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Hi Paul,

    Thanks for this information, which explains what is otherwise a very odd link between two very distinct things.

    This is a minor issue in most cases, of course.

    Acronym recognition is already a difficult decision in itself. It can be very useful at times, but it also creates Repetitions that aren't in fact Repetitions (less so since Studio 2019) and acronyms can't always be auto-substituted ('the CIA' and 'the FBI' become 'la CIA' et 'le FBI' in French, for example). So we recognize that this feature cannot be turned into a one-size-fits-all solution and decided to just deactivate acronym recognition by default (as the option with the least annoying impact).

    It was only for a recent project with many hyperlinks that I was struck by the number of words in Signed Off after 'translating' these hyperlinks.

    I'll consider creating the idea, but I don't think it will get many votes. For most people, it probably suffices that they can choose not to extract them in the file type settings. In countries like Belgium, however, with several official languages and with English also being a common website language, we often don't need to translate hyperlinks, but we need to include them in Studio so as to be able to replace them with their other language versions. Which would mean that a discussion on the link with acronym recognition would also lead to a discussion on the number of words that a hyperlink should in fact be counted as (i.e. a single word for the entire hyperlink, or with everything between / / being counted as a single word).
    This would complicate matters even more and I don't feel strongly enough about this to start a discussion on it... :-)

    But thanks for clarifying this!
    Lieven

    emoji