Is there a more automatic way to remove the extra space between English text/numbers and a Chinese character?

Dear Studio 2017 users,

I just completed a big editing job where the client no longer wants a space between English text/numbers and a Chinese character, the style that was used previously.  I ended up spending 20-25% of the editing time manually removing these extra spaces from the matches coming from the project TM.

Is there a way or an app that can automatically remove these spaces for me?  Also, in some cases, the text/Chinese character is surround by a formatting tag pair, and the extra space is either before the opening tag or after.  I wonder if there is a way/expression that can ignore the tags and find/delete the extra space before the tag.

Thank you in advance for any suggestions!

Chunyi 

Parents
  • Hi Chunyi,

    Did you try this with the SDLXLIFF Toolkit? We made some fixes to this tool a few months ago to handle something like this where Studio fails with the out of the box search and replace. It would be really helpful if you created some small examples of the text in a file when you ask questions like this because then we won't spend so much time going backwards and forwards suggesting things that don't work for you. Just create a small word file containing the examples in a few segments and attach to your post.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,

    I pasted some sample sentences below. I will send the files to your email address, as this reply box is in simple mode and I can attach them using the image icon.
    I forgot to include a sample sentence that contains a proper name, say, Aefer Bee. I do want to keep the space between English words:)
    Thanks a lot for testing. I will also check out SDLXLIFF Toolkit later.

    This is a test. For emergencies, please call 911.
    Call toll free at 1 800 123 4567 to learn more.
    It is effective from July 1, 2017 to June 30, 2018.

    Chunyi
  • Hi Chunyi,

    You could click on "Advanced Editing Options" and then you can load your files there too.

    Quite tricky to find rules for all cases and some of them, around the tags for example I can only solve by working directly on the SDLXLIFF in a decent text editor.  So if I search for this:

    ([\u4E00-\u9FA5])\s(<[^/]+>\d)|([\u4E00-\u9FA5]<[^/]+>)\s(\d)

    And replace with this:

    $1$2$3$4

    Then this will resolve the ones with spaces either side of the tags.  The ones without tags at all I presume you have no problem with as these are easily handled in the Studio Editor or the Toolkit for example.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Reply
  • Hi Chunyi,

    You could click on "Advanced Editing Options" and then you can load your files there too.

    Quite tricky to find rules for all cases and some of them, around the tags for example I can only solve by working directly on the SDLXLIFF in a decent text editor.  So if I search for this:

    ([\u4E00-\u9FA5])\s(<[^/]+>\d)|([\u4E00-\u9FA5]<[^/]+>)\s(\d)

    And replace with this:

    $1$2$3$4

    Then this will resolve the ones with spaces either side of the tags.  The ones without tags at all I presume you have no problem with as these are easily handled in the Studio Editor or the Toolkit for example.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Children