Issues with alphanumeric string pre-translation and recognition

Hi everyone,

we've recently come across a strange issue related to alphanumeric string recognition in Trados Studio.

Project settings/conditions:
Project files are created by the customer in Trados Enterprise and received/processed by the PMs or linguists in Trados Studio 2024 (SR1).
All of the TMs used have alphanumeric recognition enabled in the TM settings.
Auto-substitution is enabled for alphanumeric strings for each language pair.
There is no penalty applied to AT of alphanumeric strings.

Issue 1: Only a portion of the alphanumeric strings is marked as pre-translated.
In the translation editor, those segments are highlighted as CM, even though there is no exact match from the TM.
We assume that the pre-translations are indeed the result of AT, and we would expect Studio to mark them as AT, not CM.
AT, however, does not work the same way for all segments. The 99% segments basically have the same structure as the ones marked as CM,
but they don't have the same status.

So two questions here:
Why does Studio/Enterprise use a CM marker for something that is clearly an AT match with no entry in the TM?
And why are some segments treated as CM and others as 99% matches even though they should be treated equally?

Issue 2: For the 99% matches, Studio suggests various CM or 100% matches that clearly have nothing to do with the actual source text.
For example, the TM unit "PH2 > PH2" is suggested as a CM for "3601JK3200".
Translators obviously won't have a problem with that as they already have a pre-populated target segment with the correct string.
But this completely messes up the analysis results, as the Studio analysis will show more CM/100% (that usually are not paid for) than there actually are present in the project files.

So what would be your take on this? Ask the customer to disable alphanumeric string recognition and AT altogether,
which would result in less AT hits? Or is there a defect with the recognition pattern that can be fixed in a future update?

Thanks in advance for your support!

Best regards,
Julian

emoji
Parents
  •   

    I see in segment 1273 that the alphanumeric string has been recognized. I thought that you needed to have the auto-substitution enabled, but you said it was (maybe you can double check it):

    Trados Studio Project Settings window showing Auto-substitution options with 'Alphanumeric Strings' checkbox enabled and highlighted in red.

    You can try upgrading the TMs.

    Finally I thought that the issue is in your client's side.

    If you don’t want to be stuck while your client decides anything and/or RWS says something about this issue, I’d filter by these alphanumeric strings, a regex something like ^\d+[A-Z]+\d+$ will do the trick, copy source to target in all segments, change to Translated status, lock them up and that’s it.

    emoji


    Generated Image Alt-Text
    [edited by: RWS Community AI at 12:34 PM (GMT 1) on 19 Aug 2025]
  • Hi  

    Thanks for your feedback! Yes, auto-substitution is enabled by default, so it does make sense that Studio copied the string to the target segment. What I don't understand is that it assigns different match attributes to these segments. I'd expect a CM to be an actual match from the TM, which clearly it is not. We have pre-translated segments with a CM marker that are the result of auto-substitution/recognition (and not a 1:1 match from the TM) and others that only have a 99% match (where Studio suggests a completely different TU as a CM).
    Thanks for the suggestion regarding the regex. It won't help us much specifically, as we usually deal with large multilingual projects. Even with automation through plugins, that's a lot of additional manual overhead. I also tried upgrading and re-indexing the TMs, but that did not make a difference.

    There already are a lot of open tickets regarding the alphanumerical string recognition. The one that comes closest to my request is this one from over 6 years ago:  Issue with Alphanumeric Strings Recognition

    I'd appreciate if someone from the RWS support team could point out to me why some of the recognized/auto-substituted segments are treated as CM or 100% matches and others aren't.
    Without a proper 1:1 TM match, I'd basically consider all of these strings as AT segments, also putting them in a completely different match category (in terms of QA and payment).

    Best regards,
    Julian

    emoji
Reply
  • Hi  

    Thanks for your feedback! Yes, auto-substitution is enabled by default, so it does make sense that Studio copied the string to the target segment. What I don't understand is that it assigns different match attributes to these segments. I'd expect a CM to be an actual match from the TM, which clearly it is not. We have pre-translated segments with a CM marker that are the result of auto-substitution/recognition (and not a 1:1 match from the TM) and others that only have a 99% match (where Studio suggests a completely different TU as a CM).
    Thanks for the suggestion regarding the regex. It won't help us much specifically, as we usually deal with large multilingual projects. Even with automation through plugins, that's a lot of additional manual overhead. I also tried upgrading and re-indexing the TMs, but that did not make a difference.

    There already are a lot of open tickets regarding the alphanumerical string recognition. The one that comes closest to my request is this one from over 6 years ago:  Issue with Alphanumeric Strings Recognition

    I'd appreciate if someone from the RWS support team could point out to me why some of the recognized/auto-substituted segments are treated as CM or 100% matches and others aren't.
    Without a proper 1:1 TM match, I'd basically consider all of these strings as AT segments, also putting them in a completely different match category (in terms of QA and payment).

    Best regards,
    Julian

    emoji
Children
No Data