XLF file - <g> tags to be excluded from segments

Hi Community,

I would like to clean up an XLF file and help the segmentation a bit, as there are multiple sentences within the translation units, all added to the same segment in Studio.

I would like to amend the parser settings so the <g> tags are excluded from the segments.

Here is a sample:

<g id="rG3HWrhBpwgG-Jg0" ctype="x-html-P">
<g id="PfWMnh-oqz10lr2y" ctype="x-html-SPAN" xhtml:style="color: rgb(0, 0, 0);">Welcome to our Cyber Security Week training course! This is your opportunity to learn how you can protect yourself wherever you are, and to refresh your understanding of cyber security topics such as 
<g id="soLH-J4KMcUGcmx9" ctype="x-html-STRONG">Phishing</g>, 
<g id="-4yEvEpngeUlciMV" ctype="x-html-STRONG">Social Engineering</g> and 
<g id="vVmwpPgsr-qHfHcC" ctype="x-html-STRONG">Information Classification</g>.</g></g>
</source>
</trans-unit>
</body>
</file>
<file original="l9vjMq2X-3LkpMPjDt8vFqEMhDeBLuGA" datatype="plaintext" source-language="en-GB">
<body>
<trans-unit id="title">
<source>Secure Working</source>
</trans-unit><trans-unit id="items|1|items|0|paragraph">
<source>
<g id="qv21OEzOZzzf7Wm1" ctype="x-html-P">
<g id="n3JOa20jwbUvaWml" ctype="x-html-SPAN" xhtml:style="font-size: 18px; color: rgb(255, 255, 255);">Cyber criminals are relentless in their attacks and use various tactics to try and steal our company and personal data, whether we’re working from home, in the office, in a factory or travelling. Below are some key guidelines that will help to keep you safe against cyber-crime. </g>
</g>
</g>
</source>
</trans-unit>

I added these:

<[^/]\w*[^<>]*>
<[/]\w*[^<>]*>

These are my settings:

Trados Studio settings window showing the 'Embedded content' section with options for processing content embedded in a document. The 'Tag definition' table includes the user's input for 'Start tag' and 'End tag' with 'Type' set to 'Translatable'.

(Segmentation hint: exclude)

But it's not working, the text is not broken after the tag pair.


Can you please help what I'm doing wrong? 

I also tried creating a custom file type (based on XML v 1.3.0.0) where the segmentation is good but the source gets overwritten with the target in the end, and I'm not sure what to do.


Thank you!
Greta



Generated Image Alt-Text
[edited by: Trados AI at 4:49 AM (GMT 0) on 5 Mar 2024]
emoji
Parents
  • Hi 

    Thank you - I saw I missed to include the beginning of the segment., sorry.

    I hope the below is better.

    <xliff xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd"
    xmlns="urn:oasis:names:tc:xliff:document:1.2"
    xmlns:xhtml="http://www.w3.org/1999/xhtml" version="1.2">
      <file original="course" datatype="plaintext" source-language="en-GB">
        <body>
          <trans-unit id="title">
            <source>Cyber Security Week ES</source>
          </trans-unit>
          <trans-unit id="description">
            <source>
              <g id="rG3HWrhBpwgG-Jg0" ctype="x-html-P">
                <g id="PfWMnh-oqz10lr2y" ctype="x-html-SPAN"
                xhtml:style="color: rgb(0, 0, 0);">Welcome to our Cyber Security
                Week training course! This is your opportunity to learn how you can
                protect yourself wherever you are, and to refresh your
                understanding of cyber security topics such as 
                <g id="soLH-J4KMcUGcmx9" ctype="x-html-STRONG">Phishing</g>, 
                <g id="-4yEvEpngeUlciMV" ctype="x-html-STRONG">Social
                Engineering</g>and 
                <g id="vVmwpPgsr-qHfHcC" ctype="x-html-STRONG">Information
                Classification</g>.</g>
              </g>
            </source>
          </trans-unit>
        </body>
      </file>
      <file original="l9vjMq2X-3LkpMPjDt8vFqEMhDeBLuGA" datatype="plaintext"
      source-language="en-GB">
        <body>
          <trans-unit id="title">
            <source>Secure Working</source>
          </trans-unit>
          <trans-unit id="items|1|items|0|paragraph">
            <source>
              <g id="qv21OEzOZzzf7Wm1" ctype="x-html-P">
                <g id="n3JOa20jwbUvaWml" ctype="x-html-SPAN"
                xhtml:style="font-size: 18px; color: rgb(255, 255, 255);">Cyber
                criminals are relentless in their attacks and use various tactics
                to try and steal our company and personal data, whether
                we’re working from home, in the office, in a
                factory or travelling. Below are some key guidelines that will help
                to keep you safe against cyber-crime.</g>
              </g>
            </source>
          </trans-unit>
        </body>
      </file>
    </xliff>

  • Hi 

    Apart from the fact I can't even open this file as an XLIFF, and that I believe it's probably invalid since this the xhtml:style attribute is not allowed in the <g> element in XLIFF, you have a few problems related to what you are trying to achieve:

    1. as it's XLIFF the source and target elements are already segmented by trans-unit
    2. the <g> element is handled as part of the XLIFF parser and is not something you can handle separately using an embedded content processor.  It's also supposed to be inline according to the XLIFF specification.
    3. if you use a custom XML filetype then you need to populate the target element in the source file because you are essentially going to be translating the target, and not the source.  So you need to add the target with a copy of the source first and then process the file with a parser rule to extract the target.
Reply Children
No Data