Is there a Regex pattern to search backward (or UP)?

Good morning,

if I’m not in error, as I have understood reading several Regex documents, it is not possible to use a Regex pattern to search backward (or UP). Is it right, or did I miss anything?

Suppose to have a simple text (*.txt) document similar to this one with hundreds of [token] blocks, in which, e.g., insert the Italian translation only inside the [it-it]"MISSING" component, ignoring the already translated Italian and all the other languages strings.

[token]sm083
[en-us]"Existing ORIGINAL SOURCE text first token block"
[de-de]"Existing German translated text"
[nl-nl]"Existing Dutch translated text"
[fr-fr]"Existing French translated text"
[es-es]"Existing Spanish translated text"
[ca-es]"Existing Catalan translated text"
[pt-pt]"Existing Portuguese PT translated text"
[pt-br]"Existing Portuguese BR translated text"
[it-it]"Existing Italian translated text"

[token]sm001
[en-us]"Existing ORIGINAL SOURCE text second token block"
[de-de]"MISSING"
[nl-nl]"MISSING"
[fr-fr]"MISSING"
[es-es]"Existing Spanish translated text"
[ca-es]"MISSING"
[pt-pt]"MISSING"
[pt-br]"MISSING"
[it-it]"MISSING"

[token]sm055
[en-us]"Existing ORIGINAL SOURCE text third token block"
[de-de]"MISSING"
[nl-nl]"Existing Dutch translated text"
[fr-fr]"MISSING"
[es-es]"Existing Spanish translated text"
[ca-es]"MISSING"
[pt-pt]"MISSING"
[pt-br]"MISSING"
[it-it]"MISSING"

The final scope is to replace the “MISSING” text of the [it-it]"MISSING" line with the pertinent “Existing ORIGINAL SOURCE text” of the [en-us]"Existing ORIGINAL SOURCE text" line in order to be able to import in SDL Studio Editor just the Italian “MISSING” lines, duly amended with the word MISSING followed by the pertinent [en-us]"Existing ORIGINAL SOURCE text second/third token block" to be translated.

In fact, this is not a problem if I work manually on each single [token] block using the proper search Regex pattern:

(\[en-us\].*?")(.*?)(".*?)(\[it-it\]"MISSING")

and the replace pattern:

$1$2$3\[it-it\]"MISSING - $2"

Obviously, if I apply these patterns starting from the beginning of the file they work, but they select all the text from start of document up to the first occurrence of the [it-it]"MISSING" line and the replacement pattern doesn’t work.

At this point I have run a search for the [it-it]"MISSING" line (\[it-it\]"MISSING") but I did not found a solution to extend the selection backward (or UP) to the pertinent block [en-us]"Existing ORIGINAL SOURCE text".

Is there a pattern to do this? In the documents I have read I was not be able to find a solution.

Or, do you have an alternative pattern that could work cumulatively just on all blocks containing the [it-it]"MISSING" line starting from the beginning of the document?

Thank you.

Claudio

Parents
  • I have not tested that, but maybe a little different approach may help?
    Instead of searching and replacing MISSING in Italian, I would try to declare en-us as source and it-it as target, then change the extension to XLF and try that way.
    To do that I replaced (\[en-us\]")(.*?)(") with <source>\2</source> and (\[it-it\]")(.*?)(") with <target>\2</target>, where "." (dot) does NOT match new line.
    Unfortunately I am not smart enough to produce a proper XLF file then, as it will need a bit more preparation, but maybe this will help you to see the problem from other perspective.

    Kind regards, Jerzy

    _________________________________________________________

    When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

    Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

  • Hi Jerzy,

    thank you for your different approach, and I will take it in due consideration, even if it requires some work to regenerate the original translated final format file.

    As a matter of fact, I was able to solve the original problem using a "traditional macro" procedure in Notepad++ that allowed me to search backward (or UP) after having found the MISSING strings in the [it--it] lines and I am still searching if there is a Regex pattern to search backward (or UP).

    Thank you for your suggestion.

    Claudio

  • Hi Claudio,

    I don't think you can force the regex to search both ways as it doesn't work that way, so you need a different approach.  I struggled with this one but got some great help from Jan Goyvaerts (second to none when it comes to regex!) and author of this site:

    So use this:

    Search:

    (\[en-us\]([^\r\n]+)(?:\r?\n\[[^\r\n]+)*\r?\n\[it-it\])"MISSING"

    Replace

    $1$2

    Make sure you also use dot matches newline.  Very clever solution.  This is how he explained it to me as you may find this helpful too:

    Unknown said:

    In the attached regex I used ([^\r\n]+) to match the text after [en-us], ensuring it does not span across lines.  I used (?:\r?\n\[[^\r\n]+)* to require all the lines that are skipped over to start with an opening bracket.  This ensures the regex match does not run across the blank line that delimits each block.  This part will actually match the [it-it] line too.  The remainder of the regex will force the * to backtrack to give up the [it-it] line which the remainder of the regex can then match.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,

    thank you for your reply and thanks to Jan Goyvaerts too, whom site I regularly access to find the best learning.

    The solution you both suggested works perfectly, is completely logical, and solves the issue I proposed.

    Regards,

    Claudio

  • Hi Claudio,

    Glad to hear you got the right regex pattern!

    Anyway, I'd have chosen an alternative way: to insert the whole text in Excel, so one line per cell in the first column, and then use the Excel text-functions to extract and concatenate the strings you need.

    Although it can take 10 minutes to get exactly what you want, I think that the Excel way is faster than getting the regex pattern, although I reckon it might be not so challenging and rewarding.

    ... Jesús Prieto ...
Reply Children
No Data