Under Community Review

HTML Entity Conversion similar to XML2; allowing Read as character and write as character for any/all characters

Currently, in order for HTML and Embedded HTML parsers to normalize entities as the character itself (i.e. ® converted to ® instead of an inline tag) you must enable entity conversion. However, this means that when the file is written out it will write all ® as ® regardless of what the source was. Recently with XML 2, it is possible to define Read as character and write as character for each value (this is currently only possible for <, >, ", ', and & via te Advanced HTML Entity Settings). In most cases I want to write out the actual character and not the entity considering we are using Unicode, but because we have a lot of legacy content that uses HTML entities heavily, I need to normalize for TM reuse, ease of translation, and consistency across the TMs.

Additionally, because we use the embedded HTML parser for many files that while they can render entities in most fields that is not the case for others like SEO, metadata, et cetera. This lack of properly HTML entity handling is causing us numerous headaches.

Trados Studio Ideas settings window showing Advanced HTML Entity Settings with a list of entity mappings for characters like less than, greater than, and ampersand. Trados Studio Ideas options window with Entity conversion checkbox selected and a table highlighting the Read as character and Write as entity settings for various HTML entities.

Here is a related thread that I started 4 years ago hoping to get something like this but so far there has been no progress:

https://community.sdl.com/product-groups/translationproductivity/f/studio/8210/entity-handling-in-sdl-trados-studio-you-cannot-read-in-all-entities-as-actual-characters-into-the-translation-interface-and-write-them-out-as-the-actual-characters-as-well

Parents Comment Children
No Data