Entity Handling in SDL Trados Studio; you cannot read in all entities as actual characters into the translation interface and write them out as the actual characters as well.

I have tried to get guidance from SDL Enterprise support but so far it has mostly been work arounds but nothing that really makes sense.


1. The default HTML filter will read in HTML entities such as &reg as a tag for the Editor (translation interface). I do not understand why this would be the default behavior considering it is just another normal character and most TMs will have the actual ® character and thus cause mismatches with TM.

2. Regardless of whether I turn off Enable entity conversion in the HTML 5 filter or uncheck it under Numeric and Special Graphic > apos, Studio will always write out ' as &apos in the target translated file. &apos is not officially supported in older browsers or some email clients and thus may not display correctly, yet I cannot get Studio just write it out as '.

3. Considering most HTML is now Unicode (UTF-8) I don't understand the need for the use of entities except for characters that are specifically not allowed. Yet according to Studio help (http://producthelp.sdl.com/SDL_Trados_Studio_2015/client_en/HTML_Entities.htm):

If entity conversion is enabled, Studio converts all the character entity references (or numeric entity reference) listed under Entity Mappings that it finds in HTML documents to their character representations. Before writing the target file, Studio converts these characters back to their character entity form.
For example, if entity conversion is enabled, the character entity reference & in the source file will be displayed as the character & in the Editor. Any occurrence of & will be written as &.
If a character entity is not selected for conversion, the character entity, rather than the character, is used in the Editor.

Ideally, we would ALWAYS want to see the actual characters in the Editor (Translation Interface) and store them that way in TM, so I would be tempted to turn on entity conversion for ALL supported entities because I never know what entities the source files will contain. However, I would like to write them out as the actual characters in all cases except if something is actually not supported. But that does not seem to be possible. I am stuck with a dilemma; either have tags representing normal characters like ® in the Editor or have them written out as &reg in the target file. These should be two separate settings; one for how we read them in and one for how we write them out in the translated file.

Parents
  • After a conversation with support, it does seem like you can choose how to read and write entities individually via Options > File Types > HTML 5 > Entities > Advanced... but it is only available for: lt, gt, quot, apos & amp. It is also odd since I was previously having trouble turning off entity conversion for apos from the normal table under Numeric and Special Graphic section where apos is listed since it is actually controlled from this Advanced... setting instead. Why include it in both places if only one setting will actually get read?

    The option to read the entity in as the actual character into the Editor(Translation interface) and write it out as I choose should be available for all entities, don't you think?

Reply
  • After a conversation with support, it does seem like you can choose how to read and write entities individually via Options > File Types > HTML 5 > Entities > Advanced... but it is only available for: lt, gt, quot, apos & amp. It is also odd since I was previously having trouble turning off entity conversion for apos from the normal table under Numeric and Special Graphic section where apos is listed since it is actually controlled from this Advanced... setting instead. Why include it in both places if only one setting will actually get read?

    The option to read the entity in as the actual character into the Editor(Translation interface) and write it out as I choose should be available for all entities, don't you think?

Children