Changing the colon segmentation rule

Former Member
Former Member

I am new to Regex and am trying to change the existing colon rule in Trados/Groupshare 2017 so that it only breaks when a colon is followed by a hyphen/dash.

Here is an example of the kind of source text (German) I mean:

Technische Daten:

-Eingangsspannung: 100-240V 50/60 Hz

-Konstanter Strom: 700mA DC

So I've tried changing the existing colon rule to the following:

Before break = .[:]+

After break = [- –  —]

But something seems to be wrong - presumably my Regex - as it hasn't done the trick.

I hope one of you will be able to point me in the right direction :-)

Thanks in advance!

Rachel

  • Hi

    Does that mean the source text is actually like this?

    Technische Daten: -Eingangsspannung: 100-240V 50/60 Hz -Konstanter Strom: 700mA DC

    Or are you trying to achieve this:

    Technische Daten:
    -
    Eingangsspannung:
    100-240V 50/60 Hz
    -
    Konstanter Strom:
    700mA DC

    Sorry for the question... I'm a bit confused by what you have now and what you are trying to achieve?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • My wild guess is that you would rather want to REMOVE the Colon rule completely...

    It's my understanding is that you actually want the "-Eingangsspannung: 100-240V 50/60 Hz" and "-Konstanter Strom: 700mA DC" strings to NOT be splitted at the colon to two segments... right? If yes, then removing the Colon segmentation rule is what you want to do.

    IMO the Colon rule should have been removed long time ago from the defaults as it does more harm than good in most cases. It could be worth for translating books, where the story characters say something... but CAT tools are normally not used for such translations... and in materials translated using CAT tools it's usually UNWANTED to break segments at colon (like in this example).

  • I couldn't agree more, Evzen. I don't know about other languages, but in Spanish it's best not to segment at colons for better text continuity and to avoid capitalization errors, so I tend to delete the colon segmentation rule altogether in new TMs, but of course many projects already come segmented by the client, and all of them use the default colon segmentation.
  • Sorry, I had written another post about modifying the colon segmentation rule but in fact, removing the default colon segmentation rule is all that's needed in this case.

    The key to understanding why is that two different segmentation rules come into play when you have text like this.

    Technische Daten: (1)

    -Eingangsspannung:(2) 100-240V 50/60 Hz

    -Konstanter Strom:(2) 700mA DC

     

    Text is being segmented at (1) not because of the colon, but because of the new line (likely a hard return).

    By removing the colon segmentation rule, text won't be segmented at (2) anymore.

  • All makes sense... you would just need to create a new rule to break on, colon followed by a hyphen:

    :-

    if I'm reading this right. I must admit I'm still not 100% clear on the requirement for her here.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • IMO the "colon followed by hyphen" is actually what creates the confusion here... and what's in fact incorrect.

    IMO, what Rachel was originally after was "colon followed by linebreak and hyphen at the start of the next line".
    But the point is that - if I was right in what she actually needs(!) - what she is after is incorrect and looking in wrong direction.

    If the original text really looks as weird as this:
    Technische Daten: -Eingangsspannung: 100-240V 50/60 Hz -Konstanter Strom: 700mA DC
    (i.e. WITHOUT LINEBREAK after "Technische Daten:"), then I would question the correctness of the source in the first place!
    Simply because it looks like a buletted list incorrectly converted to plain text! I've seen this many times... as a result of lame HTML-to-text conversion... in particular an export from CMS database where someone had a bright idea to "help" us by blindly removing all HTML tags (re4sulting in such total mess).

  • Former Member
    0 Former Member
    Thank you all for the many replies and apologies for not replying sooner.

    Paul/Nora - That is exactly what I need/am trying to do. In most cases I don't want Trados to break at a colon. So this sentence, for example, should stay as one segment "-Eingangsspannung: 100-240V 50/60 Hz" . The only exception is a sub-heading in the text like "Technische Daten:" which is followed by a hyphen. Thank you for your help :-)
  • Hi

    Can you provide a small sample source file with just these bits of text in it, something like this:

    -Eingangsspannung: 100-240V 50/60 Hz
    Technische Daten: -some text in here presumably

    It sounds as though you must have a soft break after the colon in "Technische Daten:" or it is written as I showed above. Otherwise I don't understand why it doesn't break there already. But to create the exception it would be useful to see the file format and how it's actually written so we can test it.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    0 Former Member in reply to Evzen Polenka
    Hi Evzen, I've only just seen your last reply. Yes, the text is indeed as weird as that and is stored in our ERP system in HTML.

    So with the tags it looks like this:

    <p>Technische Daten:</p><p>- Eingangsspannung: 100-240V 50/60 Hz</p><p>-Konstanter Strom: 700mA DC</p>
  • Hi ,

    In this case you would have less problem if you were given the html as it would simple to achieve what you need just by removing the colon rule as previously suggested. The <p> tag would ensure that "Technische Daten:" is indeed in a segment on its own.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

1 2