Changing the colon segmentation rule

Former Member
Former Member

I am new to Regex and am trying to change the existing colon rule in Trados/Groupshare 2017 so that it only breaks when a colon is followed by a hyphen/dash.

Here is an example of the kind of source text (German) I mean:

Technische Daten:

-Eingangsspannung: 100-240V 50/60 Hz

-Konstanter Strom: 700mA DC

So I've tried changing the existing colon rule to the following:

Before break = .[:]+

After break = [- –  —]

But something seems to be wrong - presumably my Regex - as it hasn't done the trick.

I hope one of you will be able to point me in the right direction :-)

Thanks in advance!

Rachel

Parents Reply Children
  • Sorry, I had written another post about modifying the colon segmentation rule but in fact, removing the default colon segmentation rule is all that's needed in this case.

    The key to understanding why is that two different segmentation rules come into play when you have text like this.

    Technische Daten: (1)

    -Eingangsspannung:(2) 100-240V 50/60 Hz

    -Konstanter Strom:(2) 700mA DC

     

    Text is being segmented at (1) not because of the colon, but because of the new line (likely a hard return).

    By removing the colon segmentation rule, text won't be segmented at (2) anymore.

  • All makes sense... you would just need to create a new rule to break on, colon followed by a hyphen:

    :-

    if I'm reading this right. I must admit I'm still not 100% clear on the requirement for her here.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • IMO the "colon followed by hyphen" is actually what creates the confusion here... and what's in fact incorrect.

    IMO, what Rachel was originally after was "colon followed by linebreak and hyphen at the start of the next line".
    But the point is that - if I was right in what she actually needs(!) - what she is after is incorrect and looking in wrong direction.

    If the original text really looks as weird as this:
    Technische Daten: -Eingangsspannung: 100-240V 50/60 Hz -Konstanter Strom: 700mA DC
    (i.e. WITHOUT LINEBREAK after "Technische Daten:"), then I would question the correctness of the source in the first place!
    Simply because it looks like a buletted list incorrectly converted to plain text! I've seen this many times... as a result of lame HTML-to-text conversion... in particular an export from CMS database where someone had a bright idea to "help" us by blindly removing all HTML tags (re4sulting in such total mess).

  • Former Member
    0 Former Member in reply to Evzen Polenka
    Hi Evzen, I've only just seen your last reply. Yes, the text is indeed as weird as that and is stored in our ERP system in HTML.

    So with the tags it looks like this:

    <p>Technische Daten:</p><p>- Eingangsspannung: 100-240V 50/60 Hz</p><p>-Konstanter Strom: 700mA DC</p>
  • Hi ,

    In this case you would have less problem if you were given the html as it would simple to achieve what you need just by removing the colon rule as previously suggested. The <p> tag would ensure that "Technische Daten:" is indeed in a segment on its own.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    0 Former Member in reply to Paul
    Hi Paul,

    Here is another example with the complete XML/HTML:

    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>

    These are product descriptions, which are published in our web shops. The file generated by our system is a .XLIFF and also contains other texts in addition to the HTML one.
  • Hi ,

    In a simple form an XLIFF looks like this:

    <trans-unit id="1">
    <source>First sentence</source>
    <target>First sentence</target>
    </trans-unit>

    Can you show us how this looks with a complete trans-unit?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    0 Former Member in reply to Paul
    Hi Paul,

    Here as an example all the translation units for our product # 9004658. The HTML text is the last translation unit:

    <trans-unit id="1" lw:itemnumber="9004658" resname="Kurzbezeichnung">
    <source>LED- Außenwandleuchte Ohio m. Sensor anthrazit</source>
    <target state="needs-translation" maxlength="50">LED- Außenwandleuchte Ohio m. Sensor anthrazit</target>
    </trans-unit>
    <trans-unit id="2" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="FARBE" resname="FARBE">
    <source>anthrazit, weiß</source>
    <target state="needs-translation">anthrazit, weiß</target>
    </trans-unit>
    <trans-unit id="3" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LEUCHTMITTEL" resname="LEUCHTMITTEL">
    <source>1 x 6 W LED</source>
    <target state="needs-translation">1 x 6 W LED</target>
    </trans-unit>
    <trans-unit id="4" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LICHTFARBE" resname="LICHTFARBE">
    <source>warmweiß (3.000 K)</source>
    <target state="needs-translation">warmweiß (3.000 K)</target>
    </trans-unit>
    <trans-unit id="5" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="MATERIAL" resname="MATERIAL">
    <source>Aluminiumdruckguss</source>
    <target state="needs-translation">Aluminiumdruckguss</target>
    </trans-unit>
    <trans-unit id="6" lw:itemnumber="9004658" lw:content_ref="EXTENDED DESCRIPTION" lw:spec_group="" lw:spec_title="" resname="Beschreibung">
    <source>
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </source>
    <target state="needs-translation">
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </target>
    </trans-unit>
  • ok , can you just give me a complete file containing only a few translation units like this. It'll be faster for you to give me the header info etc. than for me to try and figure out how to make the XLIFF valid!

    Thank you

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    0 Former Member in reply to Paul
    Here is the complete file:

    <xliff xmlns:xsi="www.w3.org/.../XMLSchema-instance" xmlns:lw="www.lampenwelt.de/xliff" version="1.2" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 xliff-core-1.2-transitional.xsd" xmlns="urn:oasis:names:tc:xliff:document:1.2">
    <file date="2018-06-20T08:27:11" source-language="DE-DE" target-language="EN-GB" original="9004658.xliff" datatype="html">
    <header>
    <tool tool-id="GroupshareConnector" tool-company="Lampenwelt GmbH &amp; Co. KG" tool-name="Dynamics NAV Groupshare Connector" tool-version="1.02" />
    <note>Diese Datei wurde automatisiert erstellt.</note>
    </header>
    <body>
    <group id="9004658" resname="9004658">
    <trans-unit id="1" lw:itemnumber="9004658" resname="Kurzbezeichnung">
    <source>LED- Außenwandleuchte Ohio m. Sensor anthrazit</source>
    <target state="needs-translation" maxlength="50">LED- Außenwandleuchte Ohio m. Sensor anthrazit</target>
    </trans-unit>
    <trans-unit id="2" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="FARBE" resname="FARBE">
    <source>anthrazit, weiß</source>
    <target state="needs-translation">anthrazit, weiß</target>
    </trans-unit>
    <trans-unit id="3" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LEUCHTMITTEL" resname="LEUCHTMITTEL">
    <source>1 x 6 W LED</source>
    <target state="needs-translation">1 x 6 W LED</target>
    </trans-unit>
    <trans-unit id="4" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LICHTFARBE" resname="LICHTFARBE">
    <source>warmweiß (3.000 K)</source>
    <target state="needs-translation">warmweiß (3.000 K)</target>
    </trans-unit>
    <trans-unit id="5" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="MATERIAL" resname="MATERIAL">
    <source>Aluminiumdruckguss</source>
    <target state="needs-translation">Aluminiumdruckguss</target>
    </trans-unit>
    <trans-unit id="6" lw:itemnumber="9004658" lw:content_ref="EXTENDED DESCRIPTION" lw:spec_group="" lw:spec_title="" resname="Beschreibung">
    <source>
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </source>
    <target state="needs-translation">
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </target>
    </trans-unit>
    </group>
    </body>
    </file>
    </xliff>