XML file with embedded content - how to exclude text from translation

I've got an XML structure with nodes like this one:

<p id="260">This document covers the &lt;Model&gt;.</p>

I want the content of the <p> nodes to be translated, except for &lt;Model&gt.

I've set up an XML (Legacy Embedded Content) file type. In the Parser rules, <p> is set as translatable.

In Embedded Content (Legacy), I've set:

- checked Enable embedded content processing

- Document structure information: added a custom type named "Variable"

- tag definition rules: start tag&gt;  end tag &lt;  Tag pair, Not translatable.

Now in my preview, I expect a segment "This document covers the "

Instead, I get "This document covers the <Model>."

So my embedded content isn't being filtered out. I'm doing something wrong, but what?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="AuthorIT.xslt"?>
<AuthorIT version="20.3.1.40442" xmlns="http://www.authorit.com/xml/authorit" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.authorit.com/xml/authorit AuthorIT.xsd">
<Objects>
<Book wordcount="25">
<Object>
<Description>Headers and Footers</Description>
<GUID>9e6905caaefa4b019204e85a0bd8789e</GUID>
<ID>6116</ID>
<VariantParentID>4091</VariantParentID>
</Object>
<ContentsNodes>
<Node id="4092"></Node>
<Node id="4093"></Node>
<Node id="4094"></Node>
<Node id="11821"></Node>
</ContentsNodes>
<VariableAssignments>
<VariableAssignment>
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
AIT variables test filter.zip

  • Try this:

    &gt; and &lt; are < and >, but in an XML element they are escaped. When Studio extracts the content, it will have: "This document covers the <Model>."

    Daniel

    (Edited)

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 2:57 PM (GMT 0) on 1 Mar 2024]
  • In general, it's nice to provide a sample file, it just helps those who are willing to help...

    I made one up from the information you disclosed:

    Then I created a custom XML file type based on the XML 2 file type. These file types are just easier to set up than the legacy file type, with only some drawbacks.

    All customization is in the parser and embedded content sections of that file type, so it's really a 3 minute job:

    This is how Studio displays the content before I set up the embedded content processing:

    Embedded content screen:

    Rule:

    Result:

    Unless there are more requirements this is a really simple task in Studio and you can find many different ways of tackling it.

    Daniel

  • I've attached a test file to my original post now, sorry about that.
    - I have a number of existing XML filters (based on the XML (Legacy Embedded Content) file type)  I'd like to add this to. These have ~50 parser rules each, so it'd take a while to replace them with new with new filters.

    - I made a new XML 2 filter to try your suggestion. I've set it up the way you indicate, but the Preview still suggests the variables are translatable.

    - It seems to me that the conversion of &lt; and &gt; is optional: this is a setting in Entities->HTML Special. Tried both with and without entity conversion, with both options the Preview still suggests the variables are translatable.

    I suspect part of the trouble is that <Model> isn't an XML element. It's half of a tag pair.

  • The Entity conversion is when Trados Studio writes content, it's not about how content is read.

    &lt;Model&gt; is indeed not an XML element, it's embedded content. AFAIK you can't work with parser rules here, you must work with an embedded content processor of some kind.

    Daniel

  • Thanks to Daniel's info I was able to modify the filter.

    This is what I ended up doing (this is for an XML Legacy filter, for the new XML filter format it's slightly different):

    1. In the parser rules, every tag that can contain a variable must have a 'Context' entry.
    2. To add a context entry:
      double-click the tag
      in the 'edit rule' dialog, click 'Edit'.
      In the Structure Information Properties dialog, click 'Add'.
      In the Add Structure Information dialog, click in the Name field.
      Enter a context name, for example 'text'.
      Make a note of the Identifier field.
      If the Name field contains upper case letters, these will be converted to lowercase in the Identifier field.
    3. On the Embedded Content (Legacy) page, check the 'Enable embedded content processing' check box.
    4. In the 'Document structure information' field, add the Identifier for each tag you want to filter the variable names in. If you add the Name field, it won't work. So 'text' and not 'Text'.
    5. In the Tag Definition Rules area, add the rule, in my case the start tag is <.*?>
    6. Click Advance to set the segmentation rule. I chose Include.