' being automatically converted to an apostrophe when exporting XMLs

Hi,

 

We have recently encountered an issue with a source file that contained numerous HTML codes for apostrophes ('). When the target XML was exported, these were all changed to the apostrophe symbol, which then caused problems on the client's side.

 

I have tested this by exporting as soon as the project is created, and the problem is the same.

 

I managed to change it so that these were changed to "'" by using advanced entity conversion, but this code is not universally used so we'd really need the numeric one. It seems to be something in the way Studio processes XMLs, and doesn't seem to be solvable in the File Type settings.

  • Please post some examples...

    I have encountered numerous cases where the actual problem was on the client's side, NOT in Studio... the point is that not all developers actually understand properly the "fineprints" of XML format specification and tend to implement incorrect overly strict (AKA more catholic than the pope) requirements for the XML syntax...

    In particular, I've seen nonsensical requirements requiring (or forbidding) apostrophes/quotation marks to be encoded as entities, even in places where it's totally irrelevant :-\

    For example, the content of elements DOES NOT need any entities encoding, it can contain both apostrophes and quotation marks.
    The content of attributes depends on the character used to enclose attribute values - if quotation marks are used to enclose the values, then apostrophes can be used without encoding, but quotation marks must be encoded as entities... and vice versa: if the attribute values are enclosed in apostrophes, then quotation marks can be used in attribute values without encoding, but apostrophes must be encoded...
    Moreover, both of these can be freely mixed within the same XML file! I.e. characters enclosing

    From the XML syntax perspective, entities are, AFAIK, TOTALLY EQUAL to corresponding characters, no matter whether it's one of the only few defined named entities, or a numeric entity. In other words, correctly written XML parser must NOT differentiate between actual characters and entity (when appropriate, see the rules above).

    Unfortunately, many developers (including those creating various XML libraries used by other products) are making their life (way too much) easier by implementing simple (but nonsensical) "one size fits all" rules like "apostrophes must be always encoded, no matter where they are"... :-(

    And then subsequently clients without enough knowledge moan that we (localizers) did something wrong... :-(

    This is perfectly fine XML snippet:

    <dialog Adam="Where're we goin'?" Eve='To "The infinity"'>Adam asks "Where're we goin'" and Eve responds "To The infinity"</dialog>