Looking for 'special' numbers in the TM

Question

In 2016, a long-standing client introduced a style guide for numbers/measures used in their manuals. Before then, they were using the AmE number 'spelling', i.e. 34,000 for 34 thousands, 12.15 for 12 units and 15 decimal points. In 2016 they decided to match some technical standard, for which the thousands separator should be a non-breaking space (i.e. 34 000) and the decimal separator should still be the point (12.15). They decided that also the translated manuals should stick to this rule, regardless of the local custom -- for example, in my country we use the comma as the decimal separator, but I should stick to the point (no pun intended) for this client. 
 After 5 years, my TM has a mix of bad sources and bad targets due to these style changes. I would like to fix my TM so that anything I pre-translate is pre-translating according to the new style guide. 
 
 1) How do I look for numbers such as 34,000 or 16,700 in the TM? I have tried [0-9],[0-9] to no avail. I'd rather avoid to export the TM in *.tmx in XBench as it is quite large -- I am sure Studio can handle this. But how? 
 2) How do I implement a QA (in either Studio or XBench) to check that the same number formatting in the source should match? I.e. if the source reads 12.15 the target should read 12.15 as well, and not 12,15 as it did before. 
 
 Thanks!

Paul · Accepted Answer

Hi Paola , 
 
It's late and my brain is getting tired... but would this work? 
 
(?!\d,\s)(?=\d)([0-9.,\s]+) 
$1 
 
Regards

Paul · Answer

Hi Paola 
 Unknown said: 1) How do I look for numbers such as 34,000 or 16,700 in the TM? I have tried [0-9],[0-9] to no avail. 
 Unfortunately you can't search a Studio TM very efficiently. Regex is not supported so you only have wildcards and these are pretty useless really. 
 Unknown said: 2) How do I implement a QA (in either Studio or XBench) to check that the same number formatting in the source should match? I.e. if the source reads 12.15 the target should read 12.15 as well, and not 12,15 as it did before. 
 This question is sort of linked to the second part of your first question because the solution is the same for both: 
 Unknown said: I'd rather avoid to export the TM in *.tmx in XBench as it is quite large -- I am sure Studio can handle this. But how? 
 First of all, if you want to QA the TM you need to break it into bitesized chunks. Best way to do this is TMX and there is a very handy little app on the appstore called SDLTmConvert that can do this for you. You can find an article here all about the exact process you need to do this and how to use QA with your TM to improve it: 
 https://multifarious.filkin.com/2013/03/15/memory_wisdom/ 
 Then to add the QA you wanted. You need to use a "Grouped Search Expression - report if source matches but not target". You can use this to search for the source: 
 (?=\d)([0-9.,]+) 
 Then this in the target: 
 $1 
 Keep in mind that this doesn't do any verification on whether the source or the target is correct. It just finds any pattern containing numbers, commas, periods in the source and checks in the target to make sure that they are exactly the same. So should achieve what you wanted. I added the lookahead just to avoid finding commas, or periods on their own.

Paul · Answer

Hi Paola 
 Unknown said: I have read your post and since I would be losing context, that's not an option. These manuals are 90% repeated as my client adds some paragraphs here and there whenever they issue a new version. 
 Maybe read it again as you only lose context if you use the editor to correct the TU. If you use verification to find the offending segments and then edit the TU in the TM Results window then you won't lose context. So uncheck update TM in your settings and only use the process to find and correct the TUs that need correcting. 
 Unknown said: 1) In xBench, what RegEx expression should I use? 
 I have no idea since I don't know how to use Xbench. I've only played with it for the odd thing here and there and have never tried to QA a TM with it. Will add it to my looong list of things to learn about. Maybe Jerzy Czopik can help, I think he uses Xbench or maybe Josep Condal who I'm sure knows the right answer ;-) 
 Unknown said: 2) Why a RegEx expression in Studio Editor is not Working as a RegEx expression in Xbench? 
 There are dozens of different flavours of regex. Studio uses .NET. and I believe Xbench uses POSIX. So perhaps there is a difference between the syntax here.

Nora Díaz · Answer

Unknown said: 
 I have tried to this RegEx string in the Editor (in Studio, on the bilingual files): [0-9]+[.,]?[0-9]* 
 I found it on the internet, but I am not sure it does what I am looking for. It finds some numbers, but of course there is no flagging if the number format in the source does not matches the one in the target. 
 
 Hi Paola, 
 If I can chime-in, in order for an error message to be triggered when the number format in the target doesn't match the source, you need to add the expression suggested by Paul in your Verification settings, not in the Editor. 
 So you need to go to Project Settings - Verification - QA Checker - Regular Expressions, enter a description, paste Paul's regex in the Regex source box and $1 in the Regex target box, select the Condition dropdown and choose Grouped search expression - report if source matches but not target, then select Add Item in the Action dropdown, and you should be all set. 
 After doing this, whenever you enter the wrong number format in the target and confirm the segment, you will see an error symbol displayed nex to the segment status.

Trados Studio > 1. Trados Studio

Looking for 'special' numbers in the TM