Trailing spaces bug still present in Studio 2017

Hi Paul,

Are there any plans to fix the bug concerning trailing spaces in segments not being respected? I was a bit disappointed to see that nothing has changed in Studio 2017 in this respect. It's really crucial for WS projects that trailing spaces are being kept. Otherwise, auto-propagation is entirely useless.

In general, I keep wondering why a TM tool does not save the translations the way I confirm them. Isn't this the key purpose of a TM tool?

Addressing this bug in the near future would be much appreciated.

Thanks a lot and best regards,
Stefan

Parents Reply Children
  • Hi @BruceCampbell , @dbrockmann , @StefanKeller or @pfilkin

    Would anyone be able to help me with this?

    I just received another huge Memsource project and I'm not looking forward to having to remove all the extra spaces which Trados just adds of it's own accord. See below:

     

    In Memsource it looks like this:

     

     

    Once I import it into Trados it looks like this after running the source doc through an initial pre-translation:

     

     

    If I export the project back to Memsource without manually removing all the extra spaces that have been added in Trados it looks like this:

     

     

    So, if anyone can help me to understand how I can "automatically" remove all extra spaces added to tags, as well as those added after full stops, at the end of a segment, I'd be very grateful.

  • Hi ,

    Which MT engine are you using to see this?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Google Cloud Translation API (using a paid version of Google NMT: Neural Translation Model Online Predictions)

  • MT Enhanced or the out of the box Google MT provider?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Nathanael,

    I think you have a problem. It looks like Studio has a few bugs that make what you want to do impossible.

    Leading and trailing spaces are easy.

    Do a Find and Replace (Alt-H followed by E). Make sure you are using regular expressions for the find. Find: "^ " (i.e. without the quotes) in the target segment, then replace with "" (i.e. empty string) to get rid of a space at start of the segment. Find " $", then replace with "" to get rid of a space at end of the segment.

    But there is a problem with double spaces. If you simply find two spaces in a row and replace them with a single space, i.e. find "  " and replace with " ", then any tags between the spaces are removed. So you can't do that.

    So try a regex lookahead, i.e. find a space that is followed by another space: " (?= )". Well, the regex find function works, i.e. the first space is selected.

    But you can't replace it with an empty string, i.e. replace with "" doesn't work.

    You also can't replace it with anything else.

    In fact, it looks like no matter what you find using a regex lookahead, you won't be able to replace it. It is not just a problem with spaces. It looks like a bug in Studio.

    And it gets worse. If the segment contains tags, it looks like lookahead can't even find anything. So both the find and replace functions are buggy and useless when it comes to lookahead.

    So I did a Google and found the following forum discussion from January 2017: https://community.sdl.com/product-groups/translationproductivity/f/171/t/9998

    Looking at the discussion, it appears that regex lookahead and lookbehind didn't work at that time in Studio. Paul said they didn't work in the SDLXLIFF Toolkit either.

    A few days later, Paul posted that the problem had been fixed in the SDLXLIFF Toolkit (but not in Studio).

    So, let's try the SDLXLIFF Toolkit.

    I have a segment "Hi  there", with 2 spaces between "Hi" and "there". The two spaces can be found and replaced without any problem if there are no tags.

    I can also use lookahead to replace the "H" in "Hi" to a "P", using "H(?=i)" as the find string and "P" as the replace string. So lookahead find and replace works.

    However, if I make the "H" in "Hi" bold, so that there are tags before and after the "H", then I can no longer find the segment using the regex find string "Hi  there" (with 2 spaces).

    But I CAN find it using "Hi there" (with 1 space between the words) -- which should, of course, not match, since there are actually 2 spaces between the words, not one.

    This is clearly a bug, and it gets even worse.

    I can find and replace text that does not include tags, i.e. "there" or "H".

    If I search for "Hi", the find function works and displays the segment. But the replace function fails, presumably because of the tag between "H" and "i".

    So if you do a batch find and replace, you could have a surprise if any of the text you thought you replaced had a tag in it. The replace would not have worked in those segments and there would be no message indicating the failure.

    Well, maybe a lookahead would work, since you could make sure you only select the text before the tag.

    But that does not work either. I can find the "H" at the beginning of "Hi" using a lookahead, namely "H(?=i)". But the replace function still fails, even though the "i" part of the string is not selected by the regex string. I presume it fails because of the tag between the "H" and the "i".

    So there are two problems with using SDLXLIFF Toolkit to get rid of double spaces.

    First, if there are tags in the segment, then it looks like you can't even find two spaces in a row, even if the tags aren't anywhere close to the spaces. SDLXLIFF Toolkit seems to treat two spaces as a single space for regex searches if there is a tag in the segment.

    Second, even if you could find two spaces in a row, if they had a tag between them you would not be able to use lookahead to find the first one and replace it with the empty string. The presence of the tag makes the replace function fail (without a message).

    But that is precisely the situation we want to handle, two spaces with a tag between them.

    You also can't get around this by using the white-space character if there are tags in the segment. You can find "\s", but not "\s\s", even if there are two spaces in a row. Trying to find "\s{2,2}" also fails.

    So, in summary, it doesn't look like you can handle double spaces with Studio Find and Replace or the SDLXLIFF Toolkit if you have tags in a segment.

    I might have done something wrong here, as I don't use Studio's regex that much, or lookahead, so hopefully Paul will be able to help you out with a solution.

    But it looks like the bug with lookahead that was found in the Find and Replace function in January 2017 is still unfixed a year and a half later. And the patch done to the SDLXLIFF Toolkit a year and a half ago is also buggy.

    This is a bit surprising, as you would think that making sure find and replace worked properly would be kind of important.

    Best regards,
    Bruce Campbell
    ASAP Language Services

  • Hi  

    Can you do two things please:

    1. Tell me which provider caused the problem as I want to look at why the spaces are there in the first place.  So is this Studio or the plugin?
    2. Can you give me a source file to play with containing these tags?

    Thank you

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Paul (pfilkin) , I presume it's the MT Enhanced, as they're sending me an invoice every month, as below:

     



  • You'll get an invoice from google no matter which one you use! My question is have you installed the MT Enhanced plugin and do you select MT Enhanced as your MT provider when working with Google? Or do you use the built in Google Translate from Studio (which will also require you providing an API key and you'll get an invoice every month from Google).

    The reason I want to know is because if it's MT Enhanced then you should test with the new plugin released yesterday, and if it's still a problem we can look at this quickly. If it's the out of the box Google Ttranslate in Studio causing the problem then it will need to be logged with the core development team so they can resolve in a future update.

    But please send me a test file so I can work with the team to investigate this, and also look at finding a way to search replace perhaps in the meantime.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thanks

    I pre-translated the huge project with the MT Enhanced plugin, using my Google Translate API key and it worked 100% perfectly, i.e. no leading/trailing spaces whatsoever.
    Thank you very, very much for this super duper time-saver!!!

    Consequently, it would seem that the default Studio plugin called "Google Cloud Translation API" was what was causing me all the headaches.
  • Thanks for your lengthy, detailed feedback session.
    I hope that it gets noticed by those in the position to be able to do something about it.