Confusing with the Embedded content processing in the Trados Studio 2014

I am stuck in a problem dealing with embedded content, and I have tested two circumstances but both fails.

Sample 1.

<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<StringMap>

    <Name Key='HANSEL_CODE_PK' Value='&lt;font size=\"22\" color=\"14a101\"&gt;%s&lt;/font&gt;,I am at&lt;font size=\"22\" color=\"c54a00\"&gt;%s&lt;/font&gt;with Jim.' MinLev='0'/>

</StringMap>

Sample 2.

<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<StringMap>

    <Name Key='HANSEL_CODE_PK' MinLev='0'>&lt;font size=\"22\" color=\"14a101\"&gt;%s&lt;/font&gt;,I am at&lt;font size=\"22\" color=\"c54a00\"&gt;%s&lt;/font&gt;with Jim.</Name>

</StringMap>

In Sample 1, I need to translate the value attribute, and in Sample 2, I need to translated the name element.

I could confirm everything goes well in the Parser and Embedded content categories in the newly created filetype, but lead to the same result as below,

<font size=\"22\" color=\"14a101\">%s</font>,I am at<font size=\"22\" color=\"c54a00\">%s</font>with Jim.

Though, the correct extraction should be %s,I am at %s with Jim.

Any advice? Thank you.

Parents Reply Children
  • Hi WK,

    I had a quick play and would note the following:

    Sample 1: not possible because Studio will not allow the use of embedded content processing in an attribute

    Sample 2: only possible if you convert the mixture of entities and characters in the file for consistency and correct the xml

    These for example are invalid:

    size=\"22\"

    color=\"14a101\"

    size=\"22\"

    color=\"c54a00\"

    The backslash in front of the equals sign invalidates the xml for Studio... it's very picky with XML!  So if I convert it and remove the backslashes I get this:

    memoQ has a very neat feature in the regex tagger, and this would allow you to deal with the attributes, and would be required to handle the element as they don't support regex rules in the xml filter itself.

    So if you use Studio you need to do the corrections upfront, and with memoQ you need to do the corrections afterwards with the regex tagger.  I can't make up my mind if this is a Studio bug or not so I'll report it and see what the experts say!

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thank you for your quick answer, Paul!
    I tend to take this escaping characters problem as an unsupported feature by the Trados Studio, and looking forward the cumulative update to fix.
  • Hi WK,

    ok - I've discussed this at length with the development team this morning and we don't think this is a bug.  At the moment you use the regex tagger in memoQ to work around the problem that is caused because the html inside the syntax is invalid.  Using Studio you workaround it by correcting the source file as I already explained.

    The file would be handled is handled by converting the entities, but in order for <> to be recognized as part of the tag they CANNOT be escaped in first place.  So they are not caught by the XML filter.  The XML filter would normally convert the entities so the content can be matched up properly by the embedded content processors – but as there are \’ characters around the attribute values, it’s not considered as HTML valid tag content and the conversion is not applied.

    So you have two ways to fix them, one for use with the legacy xml filetype and one for the new XML filetype.  I put both here so you can see the difference.

    wk_legacy.ZIP

    wk_new.ZIP

    So you won't see a "fix" for this because we are very unlikely to fix the software to handle invalid source files.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thank you Paul and I think I've got your point, that is just remove the backslash near the equal sign, only then the Studio could recognize the embedded content in the element.
    Since the sample I paste here is just a snippet of the big xml file which comes from my client, and making the fix back and forth is clearly a big risk, btw. the file itself is well-formed (any clue about invalidity? Though it is indeed the new type of file I have ever seen), so I wonder if some flexibility like the memoQ's reggex tagger filter could be also introduced in the future update.