Looking for 'special' numbers in the TM

In 2016, a long-standing client introduced a style guide for numbers/measures used in their manuals. Before then, they were using the AmE number 'spelling', i.e. 34,000 for 34 thousands, 12.15 for 12 units and 15 decimal points. In 2016 they decided to match some technical standard, for which the thousands separator should be a non-breaking space (i.e. 34 000) and the decimal separator should still be the point (12.15). They decided that also the translated manuals should stick to this rule, regardless of the local custom -- for example, in my country we use the comma as the decimal separator, but I should stick to the point (no pun intended) for this client.

After 5 years, my TM has a mix of bad sources and bad targets due to these style changes. I would like to fix my TM so that anything I pre-translate is pre-translating according to the new style guide.

 

1) How do I look for numbers such as 34,000 or 16,700 in the TM? I have tried [0-9],[0-9] to no avail. I'd rather avoid to export the TM in *.tmx in XBench as it is quite large -- I am sure Studio can handle this. But how?

2) How do I implement a QA (in either Studio or XBench) to check that the same number formatting in the source should match? I.e. if the source reads 12.15 the target should read 12.15 as well, and not 12,15 as it did before.

 

Thanks!

Parents
  • Hi  

    Unknown said:
    1) How do I look for numbers such as 34,000 or 16,700 in the TM? I have tried [0-9],[0-9] to no avail.

    Unfortunately you can't search a Studio TM very efficiently.  Regex is not supported so you only have wildcards and these are pretty useless really.

    Unknown said:
    2) How do I implement a QA (in either Studio or XBench) to check that the same number formatting in the source should match? I.e. if the source reads 12.15 the target should read 12.15 as well, and not 12,15 as it did before.

    This question is sort of linked to the second part of your first question because the solution is the same for both:

    Unknown said:
    I'd rather avoid to export the TM in *.tmx in XBench as it is quite large -- I am sure Studio can handle this. But how?

    First of all, if you want to QA the TM you need to break it into bitesized chunks.  Best way to do this is TMX and there is a very handy little app on the appstore called SDLTmConvert that can do this for you.  You can find an article here all about the exact process you need to do this and how to use QA with your TM to improve it:

    https://multifarious.filkin.com/2013/03/15/memory_wisdom/

    Then to add the QA you wanted.  You need to use a "Grouped Search Expression - report if source matches but not target".  You can use this to search for the source:

    (?=\d)([0-9.,]+)

    Then this in the target:

    $1

    Keep in mind that this doesn't do any verification on whether the source or the target is correct.  It just finds any pattern containing numbers, commas, periods in the source and checks in the target to make sure that they are exactly the same.  So should achieve what you wanted.  I added the lookahead just to avoid finding commas, or periods on their own.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • , thank you for your answer. I have read your post and since I would be losing context, that's not an option. These manuals are 90% repeated as my client adds some paragraphs here and there whenever they issue a new version.

    I have tried exporting the TM (which has 25,000 TUs) in *.tmx and to load it in Xbench. I tried several RegEx expressions but yours returns an error (incomplete expression, it reads). I cannot find a working RegEx string for Xbench. 

    I have tried to this RegEx string in the Editor (in Studio, on the bilingual files): [0-9]+[.,]?[0-9]*

    I found it on the internet, but I am not sure it does what I am looking for. It finds some numbers, but of course there is no flagging if the number format in the source does not matches the one in the target.

     

    1) In xBench, what RegEx expression should I use? 

    2) Why a RegEx expression in Studio Editor is not Working as a RegEx expression in Xbench?

     

    I really need to have this sorted before continuing my translation. Please help! 

Reply
  • , thank you for your answer. I have read your post and since I would be losing context, that's not an option. These manuals are 90% repeated as my client adds some paragraphs here and there whenever they issue a new version.

    I have tried exporting the TM (which has 25,000 TUs) in *.tmx and to load it in Xbench. I tried several RegEx expressions but yours returns an error (incomplete expression, it reads). I cannot find a working RegEx string for Xbench. 

    I have tried to this RegEx string in the Editor (in Studio, on the bilingual files): [0-9]+[.,]?[0-9]*

    I found it on the internet, but I am not sure it does what I am looking for. It finds some numbers, but of course there is no flagging if the number format in the source does not matches the one in the target.

     

    1) In xBench, what RegEx expression should I use? 

    2) Why a RegEx expression in Studio Editor is not Working as a RegEx expression in Xbench?

     

    I really need to have this sorted before continuing my translation. Please help! 

Children