Parsing comma separated lists without spaces

I get a lot of keywords lists for translation which are comma separated but without spaces i.e. I get

roses,daffodils,anemones,dandelions,crabgrass,banyans

Studio doesn't like it when there is no space after punctuation (same applies if they use ; instead of , to separate lists) and treats these as ONE word and I end up having to create an intermediate document where I replace , with \n and then do the reverse in the output i.e.

roses,daffodils,anemones,dandelions,crabgrass,banyans

»

roses\ndaffodils\nanemones\ndandelions\ncrabgrass\nbanyans

which Studio parses into

roses

\n

daffodils

\n

etc

and then the reverse with the translation. Problem is, these keyword lists are often interspersed with "normal" strings with commas and that gets messy.

Is there a way of getting round this? I had considered parsing but I don't really want to swap every , for \n or something because that will break other sentences. Similarly, if I tell it to (not that I'm sure how I'd "say" that in the segmentation rules) put a space after every comma, then a , space will turn into a , space space which isn't great either.

Is there a better way?

Parents
  • Hi Michael Bauer (MichaelBauer)

    You could add two segmentation rules to make it look like this:

    Screenshot of Trados Studio showing a text file with a list of words such as roses, daffodils, anemones, followed by commas, indicating segmentation points.

    Then just filter on the commas, copy source to target and lock them, then you're good to go.  The basic idea is that segmentation rules follow on sequentially so first of all create a rule to break before the comma:

    break before comma rule
    \w[\w]+
    ,

    Then create a rule for after the comma:

    break after comma
    [,]+
    [\w\p{P}]

    That should be it.  I put my TM here so you can test it and inspect if you have problems.

    comma segmentation.zip

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 6:23 AM (GMT 0) on 29 Feb 2024]
Reply
  • Hi Michael Bauer (MichaelBauer)

    You could add two segmentation rules to make it look like this:

    Screenshot of Trados Studio showing a text file with a list of words such as roses, daffodils, anemones, followed by commas, indicating segmentation points.

    Then just filter on the commas, copy source to target and lock them, then you're good to go.  The basic idea is that segmentation rules follow on sequentially so first of all create a rule to break before the comma:

    break before comma rule
    \w[\w]+
    ,

    Then create a rule for after the comma:

    break after comma
    [,]+
    [\w\p{P}]

    That should be it.  I put my TM here so you can test it and inspect if you have problems.

    comma segmentation.zip

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 6:23 AM (GMT 0) on 29 Feb 2024]
Children