Convert entity characters to unicode in xml mode

When I import a unicode text in a division,
if xml mode is off, the result is ⏫4E2D
if xml mode is on, the result is 中

Are there any options or command that can give the first result in xml mode?

Chung
Toppan Vite

  • A few questions:
    What version of XPP are you running Kai?
    In your xml file what encoding are you using for the U+4e2d (and other similar) characters?

    In theory when you are importing a utf8 encoded file, you should be able to just do a toxsf on it and in XPP you will see the corresponding unicode characters.

    The fact that you are seeing a character reference entity inside XPP makes me think that you are running an import with a -jpcin switch. But maybe you need that because you are not using utf8 encoded xml?
  • When in XML mode (especially), one goal that's part of XPP is that a round-trip of the data (i.e. an import followed by an export) will preserve the "format" of the original input data as much as possible.

    So if the imported data is of the numeric character reference format, for example 中, then XPP "remembers" that and tracks that character in the numeric character reference format so that on export it will be output the same way.

    There is no option (to toxsf) to prevent that behavior.

    But if by chance you are getting a different character displayed by XPP than the character with the actual Unicode value, then you might need to add the -ncrnomap option on import (for toxsf). That option prevents toxsf from mapping numeric character references via the XCS DEFAULT spec (which can sometimes result in the wrong character in XPP when there are multiple Unicode values mapped to the same entry in the XCS spec).

    Jonathan Dagresta
    RWS Group/
    XPP Development

  • I am running on XPP8.4, the xml file is ut8 encoding.
    I can see the character after the import.
    I get the same result when xml is on with and without the jpcin switch.

    I am doing a conversion from a xml code to classic code, followed by a comparison.
    As the import results are different in two modes, it makes all the unicode characters not exporting in the same way.

    Chung
    Toppan Vite