Confusing with the Embedded content processing in the Trados Studio 2014

Question

I am stuck in a problem dealing with embedded content, and I have tested two circumstances but both fails. Sample 1. Sample 2. <font size=\"22\" color=\"14a101\">%s</font>,I am at<font size=\"22\" color=\"c54a00\">%s</font>with Jim. In Sample 1 , I need to translate the value attribute, and in Sample 2, I need to translated the name element. I could confirm everything goes well in the Parser and Embedded content categories in the newly created filetype, but lead to the same result as below, %s,I am at%swith Jim. Though, the correct extraction should be %s,I am at %s with Jim. Any advice? Thank you.

Paul · Answer

Hi WK, 
 I had a quick play and would note the following: 
 Sample 1: not possible because Studio will not allow the use of embedded content processing in an attribute 
 Sample 2: only possible if you convert the mixture of entities and characters in the file for consistency and correct the xml 
 These for example are invalid: 
 size=\"22\" 
 color=\"14a101\" 
 size=\"22\" 
 color=\"c54a00\" 
 The backslash in front of the equals sign invalidates the xml for Studio... it's very picky with XML! So if I convert it and remove the backslashes I get this: 
 
 memoQ has a very neat feature in the regex tagger, and this would allow you to deal with the attributes, and would be required to handle the element as they don't support regex rules in the xml filter itself. 
 So if you use Studio you need to do the corrections upfront, and with memoQ you need to do the corrections afterwards with the regex tagger. I can't make up my mind if this is a Studio bug or not so I'll report it and see what the experts say! 
 Regards 
 Paul

Paul · Answer

Hi WK, 
 ok - I've discussed this at length with the development team this morning and we don't think this is a bug. At the moment you use the regex tagger in memoQ to work around the problem that is caused because the html inside the syntax is invalid. Using Studio you workaround it by correcting the source file as I already explained. 
 The file would be handled is handled by converting the entities, but in order for <> to be recognized as part of the tag they CANNOT be escaped in first place. So they are not caught by the XML filter. The XML filter would normally convert the entities so the content can be matched up properly by the embedded content processors &ndash; but as there are \&rsquo; characters around the attribute values, it&rsquo;s not considered as HTML valid tag content and the conversion is not applied. 
 So you have two ways to fix them, one for use with the legacy xml filetype and one for the new XML filetype. I put both here so you can see the difference. 
 wk_legacy.ZIP 
 wk_new.ZIP 
 So you won't see a "fix" for this because we are very unlikely to fix the software to handle invalid source files. 
 Regards 
 Paul

Trados Studio > 5. Regex and XPath

Confusing with the Embedded content processing in the Trados Studio 2014