Confusing with the Embedded content processing in the Trados Studio 2014

I am stuck in a problem dealing with embedded content, and I have tested two circumstances but both fails.

Sample 1.

<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<StringMap>

    <Name Key='HANSEL_CODE_PK' Value='&lt;font size=\"22\" color=\"14a101\"&gt;%s&lt;/font&gt;,I am at&lt;font size=\"22\" color=\"c54a00\"&gt;%s&lt;/font&gt;with Jim.' MinLev='0'/>

</StringMap>

Sample 2.

<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<StringMap>

    <Name Key='HANSEL_CODE_PK' MinLev='0'>&lt;font size=\"22\" color=\"14a101\"&gt;%s&lt;/font&gt;,I am at&lt;font size=\"22\" color=\"c54a00\"&gt;%s&lt;/font&gt;with Jim.</Name>

</StringMap>

In Sample 1, I need to translate the value attribute, and in Sample 2, I need to translated the name element.

I could confirm everything goes well in the Parser and Embedded content categories in the newly created filetype, but lead to the same result as below,

<font size=\"22\" color=\"14a101\">%s</font>,I am at<font size=\"22\" color=\"c54a00\">%s</font>with Jim.

Though, the correct extraction should be %s,I am at %s with Jim.

Any advice? Thank you.

Parents Reply Children
  • Hi WK,

    ok - I've discussed this at length with the development team this morning and we don't think this is a bug.  At the moment you use the regex tagger in memoQ to work around the problem that is caused because the html inside the syntax is invalid.  Using Studio you workaround it by correcting the source file as I already explained.

    The file would be handled is handled by converting the entities, but in order for <> to be recognized as part of the tag they CANNOT be escaped in first place.  So they are not caught by the XML filter.  The XML filter would normally convert the entities so the content can be matched up properly by the embedded content processors – but as there are \’ characters around the attribute values, it’s not considered as HTML valid tag content and the conversion is not applied.

    So you have two ways to fix them, one for use with the legacy xml filetype and one for the new XML filetype.  I put both here so you can see the difference.

    wk_legacy.ZIP

    wk_new.ZIP

    So you won't see a "fix" for this because we are very unlikely to fix the software to handle invalid source files.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thank you Paul and I think I've got your point, that is just remove the backslash near the equal sign, only then the Studio could recognize the embedded content in the element.
    Since the sample I paste here is just a snippet of the big xml file which comes from my client, and making the fix back and forth is clearly a big risk, btw. the file itself is well-formed (any clue about invalidity? Though it is indeed the new type of file I have ever seen), so I wonder if some flexibility like the memoQ's reggex tagger filter could be also introduced in the future update.