Changing the colon segmentation rule

Former Member
Former Member

I am new to Regex and am trying to change the existing colon rule in Trados/Groupshare 2017 so that it only breaks when a colon is followed by a hyphen/dash.

Here is an example of the kind of source text (German) I mean:

Technische Daten:

-Eingangsspannung: 100-240V 50/60 Hz

-Konstanter Strom: 700mA DC

So I've tried changing the existing colon rule to the following:

Before break = .[:]+

After break = [- –  —]

But something seems to be wrong - presumably my Regex - as it hasn't done the trick.

I hope one of you will be able to point me in the right direction :-)

Thanks in advance!

Rachel

Parents
  • My wild guess is that you would rather want to REMOVE the Colon rule completely...

    It's my understanding is that you actually want the "-Eingangsspannung: 100-240V 50/60 Hz" and "-Konstanter Strom: 700mA DC" strings to NOT be splitted at the colon to two segments... right? If yes, then removing the Colon segmentation rule is what you want to do.

    IMO the Colon rule should have been removed long time ago from the defaults as it does more harm than good in most cases. It could be worth for translating books, where the story characters say something... but CAT tools are normally not used for such translations... and in materials translated using CAT tools it's usually UNWANTED to break segments at colon (like in this example).

  • I couldn't agree more, Evzen. I don't know about other languages, but in Spanish it's best not to segment at colons for better text continuity and to avoid capitalization errors, so I tend to delete the colon segmentation rule altogether in new TMs, but of course many projects already come segmented by the client, and all of them use the default colon segmentation.
  • IMO the "colon followed by hyphen" is actually what creates the confusion here... and what's in fact incorrect.

    IMO, what Rachel was originally after was "colon followed by linebreak and hyphen at the start of the next line".
    But the point is that - if I was right in what she actually needs(!) - what she is after is incorrect and looking in wrong direction.

    If the original text really looks as weird as this:
    Technische Daten: -Eingangsspannung: 100-240V 50/60 Hz -Konstanter Strom: 700mA DC
    (i.e. WITHOUT LINEBREAK after "Technische Daten:"), then I would question the correctness of the source in the first place!
    Simply because it looks like a buletted list incorrectly converted to plain text! I've seen this many times... as a result of lame HTML-to-text conversion... in particular an export from CMS database where someone had a bright idea to "help" us by blindly removing all HTML tags (re4sulting in such total mess).

  • Former Member
    0 Former Member in reply to Evzen Polenka
    Hi Evzen, I've only just seen your last reply. Yes, the text is indeed as weird as that and is stored in our ERP system in HTML.

    So with the tags it looks like this:

    <p>Technische Daten:</p><p>- Eingangsspannung: 100-240V 50/60 Hz</p><p>-Konstanter Strom: 700mA DC</p>
  • Hi ,

    In this case you would have less problem if you were given the html as it would simple to achieve what you need just by removing the colon rule as previously suggested. The <p> tag would ensure that "Technische Daten:" is indeed in a segment on its own.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    0 Former Member in reply to Paul
    Hi Paul,

    Here is another example with the complete XML/HTML:

    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>

    These are product descriptions, which are published in our web shops. The file generated by our system is a .XLIFF and also contains other texts in addition to the HTML one.
  • Hi ,

    In a simple form an XLIFF looks like this:

    <trans-unit id="1">
    <source>First sentence</source>
    <target>First sentence</target>
    </trans-unit>

    Can you show us how this looks with a complete trans-unit?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    0 Former Member in reply to Paul
    Hi Paul,

    Here as an example all the translation units for our product # 9004658. The HTML text is the last translation unit:

    <trans-unit id="1" lw:itemnumber="9004658" resname="Kurzbezeichnung">
    <source>LED- Außenwandleuchte Ohio m. Sensor anthrazit</source>
    <target state="needs-translation" maxlength="50">LED- Außenwandleuchte Ohio m. Sensor anthrazit</target>
    </trans-unit>
    <trans-unit id="2" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="FARBE" resname="FARBE">
    <source>anthrazit, weiß</source>
    <target state="needs-translation">anthrazit, weiß</target>
    </trans-unit>
    <trans-unit id="3" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LEUCHTMITTEL" resname="LEUCHTMITTEL">
    <source>1 x 6 W LED</source>
    <target state="needs-translation">1 x 6 W LED</target>
    </trans-unit>
    <trans-unit id="4" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LICHTFARBE" resname="LICHTFARBE">
    <source>warmweiß (3.000 K)</source>
    <target state="needs-translation">warmweiß (3.000 K)</target>
    </trans-unit>
    <trans-unit id="5" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="MATERIAL" resname="MATERIAL">
    <source>Aluminiumdruckguss</source>
    <target state="needs-translation">Aluminiumdruckguss</target>
    </trans-unit>
    <trans-unit id="6" lw:itemnumber="9004658" lw:content_ref="EXTENDED DESCRIPTION" lw:spec_group="" lw:spec_title="" resname="Beschreibung">
    <source>
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </source>
    <target state="needs-translation">
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </target>
    </trans-unit>
  • ok , can you just give me a complete file containing only a few translation units like this. It'll be faster for you to give me the header info etc. than for me to try and figure out how to make the XLIFF valid!

    Thank you

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Former Member
    0 Former Member in reply to Paul
    Here is the complete file:

    <xliff xmlns:xsi="www.w3.org/.../XMLSchema-instance" xmlns:lw="www.lampenwelt.de/xliff" version="1.2" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 xliff-core-1.2-transitional.xsd" xmlns="urn:oasis:names:tc:xliff:document:1.2">
    <file date="2018-06-20T08:27:11" source-language="DE-DE" target-language="EN-GB" original="9004658.xliff" datatype="html">
    <header>
    <tool tool-id="GroupshareConnector" tool-company="Lampenwelt GmbH &amp; Co. KG" tool-name="Dynamics NAV Groupshare Connector" tool-version="1.02" />
    <note>Diese Datei wurde automatisiert erstellt.</note>
    </header>
    <body>
    <group id="9004658" resname="9004658">
    <trans-unit id="1" lw:itemnumber="9004658" resname="Kurzbezeichnung">
    <source>LED- Außenwandleuchte Ohio m. Sensor anthrazit</source>
    <target state="needs-translation" maxlength="50">LED- Außenwandleuchte Ohio m. Sensor anthrazit</target>
    </trans-unit>
    <trans-unit id="2" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="FARBE" resname="FARBE">
    <source>anthrazit, weiß</source>
    <target state="needs-translation">anthrazit, weiß</target>
    </trans-unit>
    <trans-unit id="3" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LEUCHTMITTEL" resname="LEUCHTMITTEL">
    <source>1 x 6 W LED</source>
    <target state="needs-translation">1 x 6 W LED</target>
    </trans-unit>
    <trans-unit id="4" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="LICHTFARBE" resname="LICHTFARBE">
    <source>warmweiß (3.000 K)</source>
    <target state="needs-translation">warmweiß (3.000 K)</target>
    </trans-unit>
    <trans-unit id="5" lw:itemnumber="9004658" lw:content_ref="SPECIFICATION" lw:spec_group="LEUCHTEN" lw:spec_title="MATERIAL" resname="MATERIAL">
    <source>Aluminiumdruckguss</source>
    <target state="needs-translation">Aluminiumdruckguss</target>
    </trans-unit>
    <trans-unit id="6" lw:itemnumber="9004658" lw:content_ref="EXTENDED DESCRIPTION" lw:spec_group="" lw:spec_title="" resname="Beschreibung">
    <source>
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </source>
    <target state="needs-translation">
    <HtmlText><![CDATA[<p><strong>Mit Schutzart IP54 ausgestattet: Außenwandleuchte Ohio mit Bewegungssensor</strong></p><p>Aus robustem Aluminiumdruckguss gearbeitet und dank IP54 auch vor stärkeren Witterungseinflüssen geschützt, sorgt die Außenwandleuchte Ohio an jeder modern gestalteten Fassade für optimale Umgebungsausleuchtung. </p><p>Technische Daten des Bewegungsmelders:</p><p>- Erfassungwinkel: 120°</p><p>- Erfassungsbereich: 9 m</p>]]></HtmlText>
    </target>
    </trans-unit>
    </group>
    </body>
    </file>
    </xliff>
  • Looks to me like the GroupShare Connector (which created the XLIFF file) is somehow incorrectly configured - it keeps the HTML intacted as CDATA instead of actually parsing the HTML, extracting the translatable parts only and segmenting them properly.
  • Former Member
    0 Former Member in reply to Evzen Polenka
    Hi Evzen,

    Thanks for the tip. I will pass that on to our IT and until that is changed I will use the Regex suggested by Paul for the colon segmentation rule.

    Best

    Rachel
Reply Children
No Data