Issues with regex in Find/Replace operations

A discussion was started today in ProZ (http://www.proz.com/forum/sdl_trados_support/309565-regex_f_r_question_trados_2015.html) that has brought to light several issues with regex in Find/Replace Operations that, if I remember correctly, are not new but still present even in Studio 2017.

I thought I'd start a thread here to get the discussion going with people who may be monitoring the SDL Community but not the ProZ Trados forum.

Some of the issues discussed in the ProZ thread include:

- Some segments are skipped during the Find operation for no apparent reason

- Some segments throw an error saying that the "Segment start/end cannot be deleted" when attempting to execute a replacement

- Tags are ignored in the Find/Replace operation 

- [ cannot be used in the Replace field (although including both the opening and closing brackets works); using \[ will naturally result in \[ being inserted in the replacement, instead of the bracket being escaped

- ^ is ignored as a start of segment/string anchor during the Find operation

- Find/Replace stops responding after several of these operations

 

 

Parents
  • Hi Nora,

    Thanks for sharing this... good way to kill a long train journey!  Although every hotel and every train I have to use reminds me how far off we are to being able to work online completely!

    Nora Díaz said:
    - Some segments are skipped during the Find operation for no apparent reason

    I can reproduce this and I think I can see a pattern for what's skipped.  If the segments are within a paragraph then it looks as though only the last segment in the paragraph unit is dealt with.  Like this for example:

    So the paragraph unit before and after that only contain one segment are ok, and the one that is last in the paragraph unit (in this case only 2 segments, but it would still be the last one) is ok.

    I don't have problems with the [[ and ]] though.

    I also get better results with the SDLXLIFF Toolkit as this doesn't skip any segments.  But it does have problems with tags, and with segments containing code... like <texttag>text in here</texttag> for example.  So we do need to address a few things.  I'm going to start with the toolkit as I can do something about this faster.

    Nora Díaz said:
    - Some segments throw an error saying that the "Segment start/end cannot be deleted" when attempting to execute a replacement

    How do I reproduce this?

    Nora Díaz said:
    - Tags are ignored in the Find/Replace operation 

    Yep... got it.

    Nora Díaz said:
    - [ cannot be used in the Replace field (although including both the opening and closing brackets works); using \[ will naturally result in \[ being inserted in the replacement, instead of the bracket being escaped

    Works for me.  Would be good to have some examples of the failing files and steps to reproduce.

    Nora Díaz said:
    - ^ is ignored as a start of segment/string anchor during the Find operation

    Again... examples please.  Seems to be fine for me.

    Nora Díaz said:
    - Find/Replace stops responding after several of these operations

    I could not reproduce this specifically... but I could crash it altogether if this is what you mean?

    Regards

    Paul

Reply
  • Hi Nora,

    Thanks for sharing this... good way to kill a long train journey!  Although every hotel and every train I have to use reminds me how far off we are to being able to work online completely!

    Nora Díaz said:
    - Some segments are skipped during the Find operation for no apparent reason

    I can reproduce this and I think I can see a pattern for what's skipped.  If the segments are within a paragraph then it looks as though only the last segment in the paragraph unit is dealt with.  Like this for example:

    So the paragraph unit before and after that only contain one segment are ok, and the one that is last in the paragraph unit (in this case only 2 segments, but it would still be the last one) is ok.

    I don't have problems with the [[ and ]] though.

    I also get better results with the SDLXLIFF Toolkit as this doesn't skip any segments.  But it does have problems with tags, and with segments containing code... like <texttag>text in here</texttag> for example.  So we do need to address a few things.  I'm going to start with the toolkit as I can do something about this faster.

    Nora Díaz said:
    - Some segments throw an error saying that the "Segment start/end cannot be deleted" when attempting to execute a replacement

    How do I reproduce this?

    Nora Díaz said:
    - Tags are ignored in the Find/Replace operation 

    Yep... got it.

    Nora Díaz said:
    - [ cannot be used in the Replace field (although including both the opening and closing brackets works); using \[ will naturally result in \[ being inserted in the replacement, instead of the bracket being escaped

    Works for me.  Would be good to have some examples of the failing files and steps to reproduce.

    Nora Díaz said:
    - ^ is ignored as a start of segment/string anchor during the Find operation

    Again... examples please.  Seems to be fine for me.

    Nora Díaz said:
    - Find/Replace stops responding after several of these operations

    I could not reproduce this specifically... but I could crash it altogether if this is what you mean?

    Regards

    Paul

Children
  • Hi Paul,

    I tested on a file that I can't share publicly, but will create a similar one to show what happens here when I have a little time later. A couple of things in the meantime:

    Paul Filkin said:

     

    I don't have problems with the [[ and ]] though.

    Me neither, when both [[ and ]] are used in the replacement field, but when attempting, for example, to Find ^(.) and Replace with [[$1, Studio won't accept [[ in the replacement field.

     

    Paul Filkin said:

    - Some segments throw an error saying that the "Segment start/end cannot be deleted" when attempting to execute a replacement

    I could be wrong, but it seems to me that if a segment has, for example, bold formatting, with no tags showing, this happens. 

    Paul Filkin said:
     
    - [ cannot be used in the Replace field (although including both the opening and closing brackets works); using \[ will naturally result in \[ being inserted in the replacement, instead of the bracket being escaped
     
    Works for me.  Would be good to have some examples of the failing files and steps to reproduce.

    Paul Filkin said:
    Nora Díaz
    - ^ is ignored as a start of segment/string anchor during the Find operation

    For this, try searching for ^(.) and see if Find Next takes you to the next first character of the next segment or simply to the next character in the same segment.

     
    Paul Filkin said:
              Nora Díaz
    - Find/Replace stops responding after several of these operations
  • Hi Nora,

    ok - my bad.  Should have read your post properly!

    Nora Díaz said:
    Does it work if you use only the opening bracket in the replacement field? For example: [[$1 

    No it doesn't.  I get this as you explained which is a little silly as the replace only takes text and backreferences anyway so we should not be checking for complete regex rules:

    I have a feeling this one might be logged... but will check.

    The toolkit handles this one ok (if you ignore the tag problems etc.)

    Regards

    Paul

  • I vaguely remember this coming up some time ago.
  • Hi again, Paul, I emailed you some files with examples and a stack trace of the error.
  • Hi Nora,

    We resolved this and a few other things in the SDLXLIFF Toolkit. Maybe helpful should you come across these problems again:

    multifarious.filkin.com/.../

    Regards

    Paul