Changing the colon segmentation rule

Former Member
Former Member

I am new to Regex and am trying to change the existing colon rule in Trados/Groupshare 2017 so that it only breaks when a colon is followed by a hyphen/dash.

Here is an example of the kind of source text (German) I mean:

Technische Daten:

-Eingangsspannung: 100-240V 50/60 Hz

-Konstanter Strom: 700mA DC

So I've tried changing the existing colon rule to the following:

Before break = .[:]+

After break = [- –  —]

But something seems to be wrong - presumably my Regex - as it hasn't done the trick.

I hope one of you will be able to point me in the right direction :-)

Thanks in advance!

Rachel

Parents Reply
  • Former Member
    0 Former Member in reply to Paul
    Hi Paul,

    Here is another example with the complete XML/HTML:

    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>

    These are product descriptions, which are published in our web shops. The file generated by our system is a .XLIFF and also contains other texts in addition to the HTML one.
Children
  • Hi ,

    In a simple form an XLIFF looks like this:

    <trans-unit id="1">
    <source>First sentence</source>
    <target>First sentence</target>
    </trans-unit>

    Can you show us how this looks with a complete trans-unit?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    0 Former Member in reply to Paul
    Hi Paul,

    Here as an example all the translation units for our product # 9004658. The HTML text is the last translation unit:

    <trans-unit id="1" lw:itemnumber="9004658" resname="Kurzbezeichnung">
    <source>LED- Außenwandleuchte Ohio m. Sensor anthrazit</source>
    <target state="needs-translation" maxlength="50">LED- Außenwandleuchte Ohio m. Sensor anthrazit</target>
    </trans-unit>
    <trans-unit id="2" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="FARBE" resname="FARBE">
    <source>anthrazit, weiß</source>
    <target state="needs-translation">anthrazit, weiß</target>
    </trans-unit>
    <trans-unit id="3" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LEUCHTMITTEL" resname="LEUCHTMITTEL">
    <source>1 x 6 W LED</source>
    <target state="needs-translation">1 x 6 W LED</target>
    </trans-unit>
    <trans-unit id="4" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LICHTFARBE" resname="LICHTFARBE">
    <source>warmweiß (3.000 K)</source>
    <target state="needs-translation">warmweiß (3.000 K)</target>
    </trans-unit>
    <trans-unit id="5" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="MATERIAL" resname="MATERIAL">
    <source>Aluminiumdruckguss</source>
    <target state="needs-translation">Aluminiumdruckguss</target>
    </trans-unit>
    <trans-unit id="6" lw:itemnumber="9004658" lw:content_ref="EXTENDED DESCRIPTION" lw:spec_group="" lw:spec_title="" resname="Beschreibung">
    <source>
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </source>
    <target state="needs-translation">
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </target>
    </trans-unit>
  • ok , can you just give me a complete file containing only a few translation units like this. It'll be faster for you to give me the header info etc. than for me to try and figure out how to make the XLIFF valid!

    Thank you

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    0 Former Member in reply to Paul
    Here is the complete file:

    <xliff xmlns:xsi="www.w3.org/.../XMLSchema-instance" xmlns:lw="www.lampenwelt.de/xliff" version="1.2" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 xliff-core-1.2-transitional.xsd" xmlns="urn:oasis:names:tc:xliff:document:1.2">
    <file date="2018-06-20T08:27:11" source-language="DE-DE" target-language="EN-GB" original="9004658.xliff" datatype="html">
    <header>
    <tool tool-id="GroupshareConnector" tool-company="Lampenwelt GmbH &amp; Co. KG" tool-name="Dynamics NAV Groupshare Connector" tool-version="1.02" />
    <note>Diese Datei wurde automatisiert erstellt.</note>
    </header>
    <body>
    <group id="9004658" resname="9004658">
    <trans-unit id="1" lw:itemnumber="9004658" resname="Kurzbezeichnung">
    <source>LED- Außenwandleuchte Ohio m. Sensor anthrazit</source>
    <target state="needs-translation" maxlength="50">LED- Außenwandleuchte Ohio m. Sensor anthrazit</target>
    </trans-unit>
    <trans-unit id="2" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="FARBE" resname="FARBE">
    <source>anthrazit, weiß</source>
    <target state="needs-translation">anthrazit, weiß</target>
    </trans-unit>
    <trans-unit id="3" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LEUCHTMITTEL" resname="LEUCHTMITTEL">
    <source>1 x 6 W LED</source>
    <target state="needs-translation">1 x 6 W LED</target>
    </trans-unit>
    <trans-unit id="4" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LICHTFARBE" resname="LICHTFARBE">
    <source>warmweiß (3.000 K)</source>
    <target state="needs-translation">warmweiß (3.000 K)</target>
    </trans-unit>
    <trans-unit id="5" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="MATERIAL" resname="MATERIAL">
    <source>Aluminiumdruckguss</source>
    <target state="needs-translation">Aluminiumdruckguss</target>
    </trans-unit>
    <trans-unit id="6" lw:itemnumber="9004658" lw:content_ref="EXTENDED DESCRIPTION" lw:spec_group="" lw:spec_title="" resname="Beschreibung">
    <source>
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </source>
    <target state="needs-translation">
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </target>
    </trans-unit>
    </group>
    </body>
    </file>
    </xliff>
  • Looks to me like the GroupShare Connector (which created the XLIFF file) is somehow incorrectly configured - it keeps the HTML intacted as CDATA instead of actually parsing the HTML, extracting the translatable parts only and segmenting them properly.
  • Former Member
    0 Former Member in reply to Evzen Polenka
    Hi Evzen,

    Thanks for the tip. I will pass that on to our IT and until that is changed I will use the Regex suggested by Paul for the colon segmentation rule.

    Best

    Rachel
  • Thanks

    I think Evzen is probably right. I can only parse this XLIFF anyway by removing the <HtmlText> element prior to the CDATA section as this is incorrect. I also don't think the segmentation rule will have an effect if you are handling the file as XLIFF. You might have to handle it as XML instead and deal with the target element only... then I think you'll have more success.

    If you're dealing with the text extraction already discussed then the segmentation rule changes should work fine.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    0 Former Member in reply to Paul
    Hi Paul,

    Thank you for all your help with this. I will pass your feedback on to our IT so that they can update/correct the GroupShare Connector and I will see if I can get the files to segment the way I need in the meantime.

    Best

    Rachel