Automatic replacement of variables, numbers, dates, etc.

Dear Community:

I often have to deal documents that contain lots of numbers and alphanumeric combinations. And Trados always changes something. I use regex to filter the segments containing ONLY these "non-translatables", and it works quite good (though some combination are still not included, the topic for my next thread:). So, where filter can be applied, the issue easy to fix.

However, many text segments contain these alphanumerical combinations, for example: "Medir la polaridad de las baterías en las bornas +42.03-HA.03-Q100-XH5 (+) y +42.03-F111-XH6 (-)  y comprobar que es correcta." When looking through the pretranslated results, errors are found in 95% of cases, Trados changes a number/numbers or a symbol.

So to avoid this I usually deactivate the automatic replacement of all these numbers and variables in the memory settings of a particular project: 

Trados Studio project settings window showing options for automatic substitution including dates, numbers, alphanumeric chains, measurements, variables, and symbols.

However, this is only applicable to this particular project, and cannot be changed from the General settings of Studio. I mean, I did the same here (actually it led me to the same list of TMs):

Trados Studio general settings window with options to activate or deactivate automatic substitution for dates, times, alphanumeric chains, numbers, measurements, variables, and symbols.

But the result was: if the varibles are unticked from the project configurations but remain activated in the general settings, Trados stops "recognizing" them, however, if they are deactivated in the general settings but activated in those of a particular project, Trados sees them.

Can anyone explain how it works? Because what I do is disconnect this automatic replacement at the project creation stage. And I would  like to define this feature for all my translation projects

And then another question: what do "variables" mean in the list of variables?:)

To me, numbers and dates ARE variables...however, if I choose to deactivate the Variables option only, Trados still reads all this stuff.

Thanks in advance!!!



Generated Image Alt-Text
[edited by: Trados AI at 10:58 AM (GMT 0) on 29 Feb 2024]
emoji
Parents
  •   

    About the difference between Options and Project settings, this article can help you: https://multifarious.filkin.com/2014/01/24/those-project-settings/?highlight=options

    what do "variables" mean in the list of variables?

    Variables are strings that are not supposed to change from Source to Target. Typical examples are brad names (Apple and Android, for example). If Trados recognise a variable in Source, it will be presented as a placeable when you press CTRL+comma. the variable will also be inserted it in Target if it is a 100 % match and the only difference are variables. For example if you have Apple and Android as variables in your TM, this string I love Apple is in your TM, and you need to translate this similar string I love Android, the string will be translated as 100% match.

    More information here: https://docs.rws.com/980998/344726/trados-studio-2022/variable-list

    If you asked me, I don’t like the name given, variable.

    numbers and dates ARE variables

    Numbers and dates are not variables, as they may need to be translated (variables are not translated). Instead numbers and dates are placeables that Trados can recognise. More information here: https://docs.rws.com/980998/775644/trados-studio-2022/dates-and-times

    emoji
  • So helpful, Jesús, now I understand the problem I was having! I was changing the configuration trying to see the difference on the already created project and....voilá....changes performed in the global settings were not applied to the project. So after reading the first article you mention, I have it more clear than ever.

    The variables issue: clear as well! Besides I see I can customize the list of variables to be recognized by Trados. Still another question, a bit stupid maybe... if I create this list of variables in a TM shared by several persons (a TM we have in a common folder), will my colleagues also have it when they connect to this TM? Or will these variables/abbrevistions/etc. be kept only in the general settings of my PC?

    Best regards!!!!

    emoji
  • Reopening the thread...this topic is not closed yet:)

    Ok with the automatic replacement cancelling....but then, each memory has a tab called, correct me if my translation is wrong, "Language resources", which offers us the recognition of the same categories: symbols, abbreviations, numbers, variables (exactly the same list as in the Automatic Replacement window). 

    So, what is that for? As far as I can understand, if I remove all those categories from the "recognition list" of all my TMs, then there is no need to make changes in the Automatic Replacement (nothing recognized, nothing replaced :). The only thing, I think, it can affect, is the word counting (because if numbers are recognized, they are counted as words).

    Could you, please, explain the difference between the two, Jesús (or any other knowledgeable person out there;)?

    emoji
  • ,

    xplain the difference between the two

    One of them let’s enable/disable some recognizers, while the second one let’s you adjust how these are recognise. Of course, if a placeable is disabled, its settings are irrelevant.

    "Language resources", which offers us the recognition of the same categories

    Not exactly. ACRONYMS don’t let you change any settings, as ACRONYMS are ACRONYMS. On the other hand, alphanumeric strings are hard coded (you can have a look at this post to understand how they are recognised: https://rpuschmann.jimdofree.com/2015/07/26/what-are-recognized-tokens-aka-placeables-in-sdl-trados-studio/). This post explains why there are 2 alphanumeric strings recognized:

    Screenshot of Trados Studio showing a highlighted alphanumeric string '+42.03-HA.03-Q100-XH5' in the translation segment.

    My approach would be different and I’d use 2 apps for the translation and review steps:

    I’d match all the alphanumeric strings via regex, to inject them directly in Target (whether coming from Source or from the TM).

    Then, in the QA step, which translators and reviewers can execute via Verify (F8 shortcut), I’d add as many regexes as needed to catch errors on those codes:

    Screenshot of Trados Studio project settings with 'Regular Expressions' selected, showing options for source and target regex verification.

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 10:59 AM (GMT 0) on 29 Feb 2024]
  • Jesús, now I have even more questions than before.

    1. There is no problem with modifying the list of abbreviations: I have just added one, I can remove any of them from the list.....And this is what this article says: https://rpuschmann.jimdofree.com/2015/07/26/what-are-recognized-tokens-aka-placeables-in-sdl-trados-studio/

    Screenshot of Trados Studio showing the abbreviation list in the Language Resources settings with options to add, remove, and modify entries.

    2. Now I completely misunderstand the relation between the, for example, "recognition of numbers" and automatic replacement. If the program recognizes a token (number or alphanumeric string), it places it (must place it) in the target text directly. What´s the point of doing any "automatic replacement" then? And if Trados is programmed to copy these categories of tokens as they are, why do I have so many errors in my tranlsations? The favourite replacement Trados makes is changing + for - and vice versa!!!!

    I have just done a test: I removed the recognition of numbers in the Language resources tab, but  left the default settings of the Automatic Replacement, which is supposed not to work at all, since the number is no longer a recognizable token. What I got is that Trados copied the source segment to the target field... So I do not really understand what happens when we delete the categories from the "recognition list" in the Language resources section....

    3. https://appstore.rws.com/Plugin/153  and https://appstore.rws.com/Plugin/75 have read several times the description of these and do not understand their utility, at all...the second one is more or less clear, but if I have my Autosuggest and TDB connected...then what´s the point?

    4. Then this:

    Then, in the QA step, which translators and reviewers can execute via Verify (F8 shortcut), I’d add as many regexes as needed to catch errors on those codes:

    The QA is performed  during translation, allows to change the settings to inform on any fault (additional punctuation marks, spaces, etc) . What it lacks is , for example, the comparison of these recognizable tokens (a warning sign if they are not identical in the source and the target segment). 

    So I do not understand which regex should be added and what for. I use them to filter the content not requiring translation , I leave them out of the translation process and that´s it, zero faults where these information is concerned....

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 10:59 AM (GMT 0) on 29 Feb 2024]
  •   

    This can be a pretty confusing area!  But let me try and explain.

    If the program recognizes a token (number or alphanumeric string), it places it (must place it) in the target text directly. What´s the point of doing any "automatic replacement" then?

    Let's say I translate this first segment:

    The project settings I used were these with Auto-Substitution of numbers deactivated:

    I confirm it and check the next segment:

    1. The numbers are marked as changed because the auto-substitution is switched off
    2. the number is still recognised (blue underline) because the language resources are all activated
    3. the entire string is recognised, but not as one placeable (I'll come back to that in a bit)
    4. the match is 94% because of the lack of auto-substitution

    If I repeat this exercise but this time turn the "Numbers" on in the auto-substitution settings I'll get this:

    1. The numbers are all unmarked in the TM Results window
    2. the number is still recognised (blue underline) because the language resources are all activated
    3. the entire string is recognised, but not as one placeable (I'll come back to that in a bit)
    4. the match is now 100% because the number might have been different in the TM but it was auto-substituted from the source

    This is what  was explaining here:

    One of them let’s enable/disable some recognizers,

    The settings in the TM (the second of the two you queried) - language resources - do allow you to determine what get's recognised (the pattern for example), but they also allow you to decide weather they are recognised at all.  The auto-substitution settings don't prevent the number from being recognised, only from being considered in the score for matching.

    So if I remove "Numbers" from the recognition here:

    And now look again at my examples I would see this:

    1. similar to the effect of disabling auto-substitution as this also doesn't take place
    2. the number is not underlined as there is no number recognition at all

    In addition   also mentioned this:

    while the second one let’s you adjust how these are recognise.

    So you can tell the language resources, to some extent, what format the number should take, but still be recognised:

    I added some silly ones just to illustrate the point... and this brings me back to point 3. at the start about number recognition.  Your strings like these:

    +42.03-HA.03-Q100-XH5 (+)

    Are made up of:

    One number and two alphanumeric strings.  I'm not sure why the + inside the brackets isn't recognised... it probably should be but this could be down my lack of understanding.  Another way to handle them is to use th apps  mentioned.

    3. https://appstore.rws.com/Plugin/153  and https://appstore.rws.com/Plugin/75 have read several times the description of these and do not understand their utility, at all...the second one is more or less clear, but if I have my Autosuggest and TDB connected...then what´s the point?

    So let's take the first one then, the Regex AutoSuggest Provider.  This is an interactive tool whereas the Terminjector can be used in a pre-translation and is more automated.  To do it I could create a regular expression to match the entire pattern of these strings, like this for example:

    \d+\.\d+-\w+\.\d+-\w+-\w+\s\(\+\)

    Then add it here:

    And now when I type some numbers I will something like this:

    This will allow me to place the whole string in one go with a single action, as opposed to having to deal with this:

    Pretty cool if you work interactively.  However, you may have also noticed I omitted the "+" symbol from the start.  This may be due to a bug in Studio as I know the developer tried to fix this many years ago.  I think it's caused by the use of any special chars in regex when used as the first character.  But maybe one for use to investigate as this has been a problem for a long time.

     fyi.

    but if I have my Autosuggest and TDB connected...then what´s the point?

    I hope you now see the point?

    What it lacks is , for example, the comparison of these recognizable tokens (a warning sign if they are not identical in the source and the target segment). 

    This is not lacking at all.  You just need to set them up in the Verification options.

    Then you should be able to carry out the custom checks you want.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •   

    It seems that I wrote my post, but it didn’t get through! Ive deleted most of it, I couldn’t have said better than Paul:

    There is no problem with modifying the list of abbreviations: I have just added one, I can remove any of them from the list.....And this is what this article says: https://rpuschmann.jimdofree.com/2015/07/26/what-are-recognized-tokens-aka-placeables-in-sdl-trados-studio/

    The link was there for you to read what an alphanumeric string is. Nothing to do with abbreviations.

    If the program recognizes a token (number or alphanumeric string), it places it (must place it) in the target text directly. What´s the point of doing any "automatic replacement" then? And if Trados is programmed to copy these categories of tokens as they are, why do I have so many errors in my tranlsations?

    What did you do so Trados copied the source to Target?

    3. https://appstore.rws.com/Plugin/153  and https://appstore.rws.com/Plugin/75 have read several times the description of these and do not understand their utility, at all...the second one is more or less clear, but if I have my Autosuggest and TDB connected...then what´s the point?

    You have a nice video about the 1st plugin in the developers website. The 2nd one takes into account the content of the TM (I think there must be videos in YouTube as well). AutoSuggest can have several sources: AutoSuggest dictionaries, MultiTerm, and you can have even more AutoSuggest sources.

    So I do not understand which regex should be added and what for.

    Finding the right regex can be tricky. Maybe a regex matching with at least 2 hyphens will do the trick, something like this (not tested thoroughly and I don’t know the other alphanumerics you’ve got):

    [^- ]+-([^- ]+-)+\w+\b

    You can add regexes like this one to your QA as Paul mentioned to catch changed alphanumeric strings. For example, running the QA it will raise an error/warning if the string:
    +42.03-HA.03-Q100-XH5
    is translated as:
    +42.03-HL.03-QM10-XH5

    You can also use this regex to the plugins mentioned, or to filter and check alphanumeric strings manually.

    emoji
  • A superb masterclass, Paul, thank you so much for taking your time and explaining all this in detail. This is definitely for rereading... 

    Now I know the meaning of the blue line under all these combinations Slight smile

    So in our particular case, I will leave all the recognizable tokens as they are but will definitely disable the automatic replacement: after pretranslation all those 100% matches are confusing since there may be a + or a bracket missing or added for whatever reason..... I beg your pardon for a veeeery stupid question: when recognizables are disabled and are not recognized anymore, what are they for Trados? Words?

    As for the Regex AutoSuggest Provider, I will not translate interactively: when phrases containing this type of info give no result after pretranslation, I usually just copy the original to the source and modify the text part.

    This is not lacking at all.  You just need to set them up in the Verification options.

    Trados Studio project settings window showing the Regular Expressions section under Verification with various conditions for regex matching.

    However, this looks interesting. So the idea is to put the same regex in the Regex source and regex target fields and choose the option you highlight.... but then, multiple combinations will fall within the conditions described by this regex.... both the target and the source will respond to such condition but may differ from each other....

    Either the subject is difficult, or my ignorance boundless....Disappointed

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 10:59 AM (GMT 0) on 29 Feb 2024]
  • If the program recognizes a token (number or alphanumeric string), it places it (must place it) in the target text directly. What´s the point of doing any "automatic replacement" then? And if Trados is programmed to copy these categories of tokens as they are, why do I have so many errors in my tranlsations?

    What did you do so Trados copied the source to Target?

    No, I did nothing, but, if I understand it correctly, a token of this kind is something that does not need translation, so , if recognized as such, must be just copied and inserted in the target segment as it is....

    emoji
  • \d+\.\d+-\w+\.\d+-\w+-\w+\s\(\+\)

    This issue is hopeless, does not even recognize the strings I have in the text: 

    Screenshot of Trados Studio showing translation segment errors with mismatched strings 'd18, b18 y z18' and 'd16, b16 y z16'.

    Screenshot of Trados Studio with a search bar containing a regular expression and translation segments below.

    Will read all these posts once again, of course, and for the time being: Automatic replacement disconnected and manual correction....

    Thank you very much,  and   for your help!

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:00 AM (GMT 0) on 29 Feb 2024]
  • Why would you even expect that expression to match what you have shown? I think you need to learn a little about regular expressions so you can use them properly. If you don’t‘ you’ll forever have problems and will always be guessing while you have no idea what you’re using.  They look complicated, but actually they are not and if you take the time to break down the one I created and understand why it matched the examples I provided you’ll definitely understand why it won’t match the text you have shown here.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    ok - apologies for that rather abrupt response... was on my phone and didn't think I would find time this evening.  Let me try and explain that expression.

    First of al it was an expression designed for use in the RegexAutoSuggest Provider so I omitted the "+" character in the string I was matching for reasons I explained above.  If I wanted to just match this:

    +42.03-HA.03-Q100-XH5

    Then there are many ways to do it.  One way might be this

    \+\d+\.\d+-\w+\.\d+-\w+-\w+\s\(\+\)

    If I break this down:

    \+ (this matches the+ character. A + has a special meaning in regex so I have to escape it to match it as a character which I did with the backslash)

    \d+ (\d matches a number, and the + in this case tells the regex engine to match one or more until you don't find any more.  So this would match the 42)

    \. (a dot (.) also is a special character so to match the dot I have to escape it, again with the backslash)

    - (this just matches the - character.  You can also be literal with regex, so this would also match the exact string with regex: \+42\.03-HA\.03-Q100-XH5)

    \w+ (\w matches a “word character” like numbers and letters for example, and the + in this case tells the regex engine to match one or more until you don't find any more.)

    \s (this matches a single space)

    \( (round brackets also have a special meaning so to match the brackets I have to escape them)

    \) (as above)

    Using these definitions you can probably see how the expression works, and also hopefully why this expression would not match this:

    +42-A208-S10-X1

    To match this I could use something like this:

    \+\d+-\w+-\w+-\w+

    Just work your way through the sequence and try to understand how this would match it:

    \+ (matches the +)

    \d+ (matches 42)

    - (matches the -)

    \w+ (matches A208)

    - (matches the -)

    \w+ (matches S10)

    - (matches the -)

    \w+ (matches X1)

    So you cannot just take an expression created for something else and expect it to work, unless the pattern you are matching is the same.

    You can also write this several ways depending on how strict you think you need to be.  For example this would also do it:

    \+\d{2}-[A-Z]\d{3}-[A-Z]\d{2}-[A-Z]\d

    And there are many other ways to do it as well.  Hopefully that makes some sense for you ad you can see that matching simple strings like this isn't too hard.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Reply
  •  

    ok - apologies for that rather abrupt response... was on my phone and didn't think I would find time this evening.  Let me try and explain that expression.

    First of al it was an expression designed for use in the RegexAutoSuggest Provider so I omitted the "+" character in the string I was matching for reasons I explained above.  If I wanted to just match this:

    +42.03-HA.03-Q100-XH5

    Then there are many ways to do it.  One way might be this

    \+\d+\.\d+-\w+\.\d+-\w+-\w+\s\(\+\)

    If I break this down:

    \+ (this matches the+ character. A + has a special meaning in regex so I have to escape it to match it as a character which I did with the backslash)

    \d+ (\d matches a number, and the + in this case tells the regex engine to match one or more until you don't find any more.  So this would match the 42)

    \. (a dot (.) also is a special character so to match the dot I have to escape it, again with the backslash)

    - (this just matches the - character.  You can also be literal with regex, so this would also match the exact string with regex: \+42\.03-HA\.03-Q100-XH5)

    \w+ (\w matches a “word character” like numbers and letters for example, and the + in this case tells the regex engine to match one or more until you don't find any more.)

    \s (this matches a single space)

    \( (round brackets also have a special meaning so to match the brackets I have to escape them)

    \) (as above)

    Using these definitions you can probably see how the expression works, and also hopefully why this expression would not match this:

    +42-A208-S10-X1

    To match this I could use something like this:

    \+\d+-\w+-\w+-\w+

    Just work your way through the sequence and try to understand how this would match it:

    \+ (matches the +)

    \d+ (matches 42)

    - (matches the -)

    \w+ (matches A208)

    - (matches the -)

    \w+ (matches S10)

    - (matches the -)

    \w+ (matches X1)

    So you cannot just take an expression created for something else and expect it to work, unless the pattern you are matching is the same.

    You can also write this several ways depending on how strict you think you need to be.  For example this would also do it:

    \+\d{2}-[A-Z]\d{3}-[A-Z]\d{2}-[A-Z]\d

    And there are many other ways to do it as well.  Hopefully that makes some sense for you ad you can see that matching simple strings like this isn't too hard.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
Children