In 2022 SR1, Regex's Negative Lookbehind does not work

Hi dears,

When translating a patent specification, I use the "Negative Lookbehind" of the regular expression in the "Find and Replace" window provided by the edit view of trados studio to perform a find and enclose the found result in parentheses. A simplified example of the regex I'm using is "(?<!step|\s도)\s([0-9]{2.4})" as what to look for and "($1)" as what to replace.

However, after updating to 2022 SR1, the above regular expression pattern is found but not replaced.

Please check this.

emoji
  •  

    I'm not sure if your regex is correct or not as you didn't provide a sample text.  It looks as though the find expression, [0-9]{2.4} is intended to specify an integer range, for instance {2,4}... so a comma between the numbers and not a dot.  This would match any number from 0-9 that is 2 to 4 digits long.  It's also tricky to imagine what you're trying to find given the mix and match between English and Korean characters.  So I created this text to test with and just copied source to target:

    이 신규 화합물의 제조는 여러 단계 과정을 포함합니다. Step 1 involves the reaction of substance A with substance B at a temperature of 500도 for 12시간. 두 번째 단계는 물질 C를 추가하고 혼합물을 300도로 24시간 동안 유지하는 것입니다. Finally, step 3 involves cooling the mixture down to 100도 over a period of 48 hours. The purity of the resulting compound is typically 99 percent.

    Using Find and replace I can repro your problem.  I also think this may be a known issue with lookarounds.  So I tested in the SDLXLFF Toolkit as a workaround... a preferred solution for me in fact!:

    Screenshot showing the sdlxliff toolkit in a search replace operation.

    That correctly shows the operation having worked in the the toolkit, but then fails to update the sdlxliff!!

    So, I'll make sure this is logged under the search & replace in Studio, and I'll also get the AppStore Team to fix this in the toolkit... this may be quicker!

    In the meantime perhaps you ca clarify what you were actually trying to achieve to make sure we have the right expression and even a proper sample text?

      fyi.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Sorry, there was a typo "{2.4}" in the regex pattern.
    The regex find pattern is "(?<!step|\s도)\s([0-9]{2,4})".

    The problem is with the following Find and Replace window in the Editor view of Trados studio 2022 SR1.

    Find and Replace window in Trados Studio 2022 SR1 showing a regex pattern with a typo '2.4' instead of '2,4' in the 'Find what' field.

    Just in case, I checked if the above pattern is correct, in the flavor ".net regx" from "">https://regex101.com/".

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 11:55 AM (GMT 0) on 29 Feb 2024]
  •  

    The problem is with the following Find and Replace window in the Editor view of Trados studio 2022 SR1.

    I know.  I said I could reproduce that.

    I then tried to test the same expression in the SDXLIFF toolkit so you had a workaround, but that fails too.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • I've got the exact same problem with negative lookbehind. My regex worked perfectly before I updated to Studio 2022. Now it doesn't work anymore. Here it is:

    (?<!-\d{3})(\d) (\w+)

    I use it to check for breaking spaces between a digit and a word. The negative lookbehind is to exclude phone numbers. Doesn't work anymore in Studio 2022. I get a match, but when I try to replace tne breaking space by a non-breaking one by using $1<non-breaking space>$2 , I get $1<breaking space>$2, i.e. the match.

    emoji
  •  

    Thanks for letting us know.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    I'll make sure this is logged under the search & replace in Studio, and I'll also get the AppStore Team to fix this in the toolkit... this may be quicker!

    Your issue is exactly the same as the one in this thread.  Lookarounds don't work in the find/replace dialogue.  I already explained this.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    I also added your example to the dev notes so we'll be double certain we fix it for you too.  In the meantime I also created a workaround you may find useful if it works for your sample text... it's helpful when you provide one!

    (\b\d\b) (\w+)|(\d{3}-\d{3}-\d{4})([,.])

    $1 $2$3$4

    Doesn't use a lookaround and might solve the issue for you with this particular problem.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  •  

    In the meantime, there might be a workaround for you depending on what your source text looks like.  For example:

    (step\s|\s도\s)|(\s([0-9]{2,4}))

    $1($3)

    This will capture all instances of 'step ' and ' 도 ' along with the spaces and replace them with themselves, effectively ignoring them. Meanwhile, instances of a space followed by 2 to 4 digits are wrapped in parentheses.  However, this only works with my sample.  It won't work if the phrases to avoid (step or 도) can appear elsewhere in the text and not just in front of the number patterns you're interested in.  The original regular expression with lookbehinds would be more precise for this case.

    One workaround for amore extensive use of the phrases we want to avoid might be to perform multiple passes and a more complex process. For example:

    • Replace all instances of step or 도 followed by a space and a number with a unique placeholder, such as @@@.
    • Perform your existing operation to replace all instances of a space followed by 2 to 4 digits.
    • Finally, replace all instances of your unique placeholder @@@ back to the original value.

    This would be less efficient and more error-prone than using lookarounds, but it could potentially achieve the desired result.

    In the meantime we'll plan in some time to look at the SDLXLIFF Toolkit until a more permanent solution is implemented into the core product.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • In Korean patent documents, when names of members or elements and reference numerals (or reference symbols) occur concurrently in the specification and there are the same reference numerals on the drawings, the reference numerals are parenthsized in the specification, except for the drawings numbers, step numbers, etc.

    An example in Korean sentences is as follows:
    이하에서는 도 1 내지 도 10B를 참조하여 본 발명의 실시예들이 설명된다. 도 1단계 10단계 20을 포함하는 방법의 흐름도(100)를 도시한다. 그 방법에서 수신기는 먼저 신호를 수신한다(단계 10). 도 2A는 메모리(10), CPU(20), 버스(100), 및 주변기기(1000)를 포함하는 장치(200)를 도시한다.

    As far as I know, Regex is not a new one, but just uses what is supported by the platform or computer language.
    So I don't understand why the feature was removed without notice, even though I'd look for another regex pattern if the feature was removed completely

    emoji