Entity conversion in embedded content

Hi

I have some problems getting entity conversion in embedded content processing in SDL Trados Studio 2015 to work the way I'd like it to work...

I'd like Studio to interpret the character entities it finds (reader settings) and show that in the editor, but I'd like Studio to output the actual characters in the output (writer settings).

In some cases, Studio even double escapes things.

 

Input?

  • Hello Torben,

    Can you knock up a small test file please and attach to the post? Just a few lines to make this simple will do nicely and will make it easy to anonymise.

    Thanks

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hehe, we've been discussing this with Patrik the other day extensively ;-) There are two parts of the problem...
    First, you need to carefully think about what you are actually (un)escaping/converting... e.g. if you have HTML embedded inside an XML and you do the entities conversion during load at the XML level already, the content goes into the embedded HTML parser already unescaped... and vice versa during the save. So this is where your double-escaping may be coming from...
    And second, the conversion itself... If the settings would be working as actually intended, you would never be able to achieve what you want, because the SDL's generic rule is "whatever format comes in, must also go out"... i.e. if you have source files with entities, it is assumed that they are there for a reason (heh, this USED to be true long time ago in the times when clients actually KNEW what they are doing...) and therefore it is expected that they MUST be in the targets as well. It has never been intended to be able to select different options for "load" and "save" separately".
    BUT! ;-) There is a loooooong-standing "bug" in Studio across all versions - you can enable the conversion as such (enable the listbox with individual checkboxes) BUT UNCHECK ALL CHECKBOXES... ;-) And that combination actually does what you want - it converts the entities to characters during LOAD, but preserves the actual characters during save! ;-)

    We have discussed with Patrik how to get the best of both worlds - fixing this "bug", but leaving the option to select different behavior for load and save... didn't come to a reasonable solution since it would complicate the settings in the first place, and there are different ways how to organize the more complex settings page... which would be better looked at by some proper UX designer.

    The point of all this is that these days we get basically always only files CRIPPLED in million different ways, including these entities all over the place NOT for a good reason, but just because the software producing the files was written by some LAME developer who knows sh*t about computers :-\... and the 'manager' sending the files over knows about computers just "oh, it's that Facebook, Instagram and stuff, isn't it?", so asking for a fix is totally pointless...
    So we engineers need the flexibility.
  • Hi Paul

    Thank you for taking a look!

    Here is a small test file. As you can see, the client is also doing it differently, sometimes writing the actual characters, sometimes escaping things:

     

     

    Client is asking to deliver back with actual characters.

    <?xml version="1.0"?>
    <root>
      <content>
        <text>&lt;p&gt;&lt;strong&gt;If the sun is shining&lt;/strong&gt; we&amp;rsquo;d love to use R&amp;Aring;SE as &amp;nbsp;&lt;span style="line-height: 1.42857;"&gt;protection.&lt;/span&gt;&lt;/p&gt;</text>
        <text>&lt;p&gt;Tap RÅSE into a search engine and it’s quick to see that happily you love the product as much as we do. Here’s a couple of ways our product manager would love to try it out.&lt;/p&gt;</text>
      </content>
    </root>

  • Hi Torben,

    I spent a while messing with this and trying every configuration I think is possible (including the suggestion from Evzen which didn't work) but cannot achieve what you are after. I also discussed it with Patrik and he confirmed this is not going to be possible because the file is really badly authored causing a need to solve two different problems at the same time. The only option would be for us to introduce a setting for each entity and then support a different conversion behaviour for parsing and writing. This I think is starting to get too complicated when the solution is probably far easier than this.

    Just ask your client to do one thing or the other! Well... I say it's easy but it may not be!!

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • As Paul says, the file is simply completely wrong!
    You would have to pre-process first the double-escaped crap like "&amp;rsquo;" or "&amp;Aring;" using a separate routine and only then it could work in Studio, I think.