Automatic replacement of variables, numbers, dates, etc.

Question

Dear Community: 
 I often have to deal documents that contain lots of numbers and alphanumeric combinations. And Trados always changes something. I use regex to filter the segments containing ONLY these "non-translatables", and it works quite good (though some combination are still not included, the topic for my next thread:). So, where filter can be applied, the issue easy to fix. 
 However, many text segments contain these alphanumerical combinations, for example: "Medir la polaridad de las bater&iacute;as en las bornas +42.03-HA.03-Q100-XH5 (+) y +42.03-F111-XH6 (-) y comprobar que es correcta." When looking through the pretranslated results, errors are found in 95% of cases, Trados changes a number/numbers or a symbol. 
 So to avoid this I usually deactivate the automatic replacement of all these numbers and variables in the memory settings of a particular project: 
 
 However, this is only applicable to this particular project, and cannot be changed from the General settings of Studio. I mean, I did the same here (actually it led me to the same list of TMs): 
 
 But the result was: if the varibles are unticked from the project configurations but remain activated in the general settings, Trados stops "recognizing" them, however, if they are deactivated in the general settings but activated in those of a particular project, Trados sees them. 
 Can anyone explain how it works? Because what I do is disconnect this automatic replacement at the project creation stage. And I would like to define this feature for all my translation projects 
 And then another question : what do "variables" mean in the list of variables?:) 
 To me, numbers and dates ARE variables...however, if I choose to deactivate the Variables option only, Trados still reads all this stuff. 
 Thanks in advance!!!

Paul Filkin · Accepted Answer

Why would you even expect that expression to match what you have shown? I think you need to learn a little about regular expressions so you can use them properly. If you don’t‘ you’ll forever have problems and will always be guessing while you have no idea what you’re using. They look complicated, but actually they are not and if you take the time to break down the one I created and understand why it matched the examples I provided you’ll definitely understand why it won’t match the text you have shown here.

Paul Filkin | RWS

Design your own training!
You've done the courses and still need to go a little further, or still not clear?
Tell us what you need in our Community Solutions Hub

Jesús Prieto · Answer

About the difference between Options and Project settings, this article can help you: https://multifarious.filkin.com/2014/01/24/those-project-settings/?highlight=options 
 
 Unknown said: what do "variables" mean in the list of variables? 
 Variables are strings that are not supposed to change from Source to Target. Typical examples are brad names (Apple and Android, for example). If Trados recognise a variable in Source, it will be presented as a placeable when you press CTRL+comma. the variable will also be inserted it in Target if it is a 100 % match and the only difference are variables. For example if you have Apple and Android as variables in your TM, this string I love Apple is in your TM, and you need to translate this similar string I love Android , the string will be translated as 100% match. 
 More information here: https://docs.rws.com/980998/344726/trados-studio-2022/variable-list 
 If you asked me, I don&rsquo;t like the name given, variable . 
 Unknown said: numbers and dates ARE variables 
 Numbers and dates are not variables, as they may need to be translated (variables are not translated). Instead numbers and dates are placeables that Trados can recognise. More information here: https://docs.rws.com/980998/775644/trados-studio-2022/dates-and-times

Paul Filkin · Answer

This can be a pretty confusing area! But let me try and explain. 
 Unknown said: If the program recognizes a token (number or alphanumeric string), it places it (must place it) in the target text directly. What´s the point of doing any "automatic replacement" then? 
 Let's say I translate this first segment:

The project settings I used were these with Auto-Substitution of numbers deactivated:

I confirm it and check the next segment:

The numbers are marked as changed because the auto-substitution is switched off 
 the number is still recognised (blue underline) because the language resources are all activated 
 the entire string is recognised, but not as one placeable (I'll come back to that in a bit) 
 the match is 94% because of the lack of auto-substitution

If I repeat this exercise but this time turn the "Numbers" on in the auto-substitution settings I'll get this:

The numbers are all unmarked in the TM Results window 
 the number is still recognised (blue underline) because the language resources are all activated 
 the entire string is recognised, but not as one placeable (I'll come back to that in a bit) 
 the match is now 100% because the number might have been different in the TM but it was auto-substituted from the source 
 
 This is what Jesús Prieto was explaining here: 
 Jesús Prieto said: One of them let’s enable/disable some recognizers, 
 The settings in the TM (the second of the two you queried) - language resources - do allow you to determine what get's recognised (the pattern for example), but they also allow you to decide weather they are recognised at all. The auto-substitution settings don't prevent the number from being recognised, only from being considered in the score for matching. 
 So if I remove "Numbers" from the recognition here:

And now look again at my examples I would see this:

similar to the effect of disabling auto-substitution as this also doesn't take place 
 the number is not underlined as there is no number recognition at all 
 
 In addition Jesús Prieto also mentioned this: 
 Jesús Prieto said: while the second one let’s you adjust how these are recognise. 
 So you can tell the language resources, to some extent, what format the number should take, but still be recognised: 
 
 I added some silly ones just to illustrate the point... and this brings me back to point 3. at the start about number recognition. Your strings like these: 
 +42.03-HA.03-Q100-XH5 (+) 
 Are made up of: 
 
 One number and two alphanumeric strings. I'm not sure why the + inside the brackets isn't recognised... it probably should be but this could be down my lack of understanding. Another way to handle them is to use th apps Jesús Prieto mentioned. 
 Unknown said: 3. https://appstore.rws.com/Plugin/153 and https://appstore.rws.com/Plugin/75 have read several times the description of these and do not understand their utility, at all...the second one is more or less clear, but if I have my Autosuggest and TDB connected...then what´s the point? 
 So let's take the first one then, the Regex AutoSuggest Provider. This is an interactive tool whereas the Terminjector can be used in a pre-translation and is more automated. To do it I could create a regular expression to match the entire pattern of these strings, like this for example: 
 \d+\.\d+-\w+\.\d+-\w+-\w+\s$\+$ 
 Then add it here: 
 
 And now when I type some numbers I will something like this: 
 
 This will allow me to place the whole string in one go with a single action, as opposed to having to deal with this: 
 
 Pretty cool if you work interactively. However, you may have also noticed I omitted the "+" symbol from the start. This may be due to a bug in Studio as I know the developer tried to fix this many years ago. I think it's caused by the use of any special chars in regex when used as the first character. But maybe one for use to investigate as this has been a problem for a long time. 
 Oana Nagy fyi. 
 Unknown said: but if I have my Autosuggest and TDB connected...then what´s the point? 
 I hope you now see the point? 
 Unknown said: What it lacks is , for example, the comparison of these recognizable tokens (a warning sign if they are not identical in the source and the target segment). 
 This is not lacking at all. You just need to set them up in the Verification options.

Then you should be able to carry out the custom checks you want.

Paul Filkin · Answer

ok - apologies for that rather abrupt response... was on my phone and didn't think I would find time this evening. Let me try and explain that expression. 
 First of al it was an expression designed for use in the RegexAutoSuggest Provider so I omitted the "+" character in the string I was matching for reasons I explained above. If I wanted to just match this: 
 +42.03-HA.03-Q100-XH5 
 
 Then there are many ways to do it. One way might be this 
 \+\d+\.\d+-\w+\.\d+-\w+-\w+\s$\+$ 
 If I break this down: 
 \+ (this matches the+ character. A + has a special meaning in regex so I have to escape it to match it as a character which I did with the backslash) 
 \d+ (\d matches a number, and the + in this case tells the regex engine to match one or more until you don't find any more. So this would match the 42) 
 \. (a dot (.) also is a special character so to match the dot I have to escape it, again with the backslash) 
 - (this just matches the - character. You can also be literal with regex, so this would also match the exact string with regex: \+42\.03-HA\.03-Q100-XH5 ) 
 \w+ (\w matches a &ldquo;word character&rdquo; like numbers and letters for example, and the + in this case tells the regex engine to match one or more until you don't find any more. ) 
 \s (this matches a single space) 
 $ (round brackets also have a special meaning so to match the brackets I have to escape them) 
 $ (as above) 
 Using these definitions you can probably see how the expression works, and also hopefully why this expression would not match this: 
 +42-A208-S10-X1 
 To match this I could use something like this: 
 \+\d+-\w+-\w+-\w+ 
 Just work your way through the sequence and try to understand how this would match it: 
 \+ (matches the +) 
 \d+ (matches 42) 
 - (matches the -) 
 \w+ (matches A208) 
 - (matches the -) 
 \w+ (matches S10) 
 - (matches the -) 
 \w+ (matches X1) 
 So you cannot just take an expression created for something else and expect it to work, unless the pattern you are matching is the same. 
 You can also write this several ways depending on how strict you think you need to be. For example this would also do it: 
 \+\d{2}-[A-Z]\d{3}-[A-Z]\d{2}-[A-Z]\d 
 And there are many other ways to do it as well. Hopefully that makes some sense for you ad you can see that matching simple strings like this isn't too hard.

Trados Studio > 1. Trados Studio

Automatic replacement of variables, numbers, dates, etc.