XPP > XPP

Convert entity characters to unicode in xml mode

Kai Chung Lew over 8 years ago

When I import a unicode text in a division,
if xml mode is off, the result is ⏫4E2D
if xml mode is on, the result is 中

Are there any options or command that can give the first result in xml mode?

Chung
Toppan Vite

Translate

Rate translation

Suggest better translation

Bart Terryn over 8 years ago

A few questions:
What version of XPP are you running Kai?
In your xml file what encoding are you using for the U+4e2d (and other similar) characters?

In theory when you are importing a utf8 encoded file, you should be able to just do a toxsf on it and in XPP you will see the corresponding unicode characters.

The fact that you are seeing a character reference entity inside XPP makes me think that you are running an import with a -jpcin switch. But maybe you need that because you are not using utf8 encoded xml?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
Jonathan Dagresta over 8 years ago

When in XML mode (especially), one goal that's part of XPP is that a round-trip of the data (i.e. an import followed by an export) will preserve the "format" of the original input data as much as possible.

So if the imported data is of the numeric character reference format, for example 中, then XPP "remembers" that and tracks that character in the numeric character reference format so that on export it will be output the same way.

There is no option (to toxsf) to prevent that behavior.

But if by chance you are getting a different character displayed by XPP than the character with the actual Unicode value, then you might need to add the -ncrnomap option on import (for toxsf). That option prevents toxsf from mapping numeric character references via the XCS DEFAULT spec (which can sometimes result in the wrong character in XPP when there are multiple Unicode values mapped to the same entry in the XCS spec).

Jonathan Dagresta
RWS Group/XPP Development
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate
Kai Chung Lew over 8 years ago in reply to Bart Terryn

I am running on XPP8.4, the xml file is ut8 encoding.
I can see the character after the import.
I get the same result when xml is on with and without the jpcin switch.

I am doing a conversion from a xml code to classic code, followed by a comparison.
As the import results are different in two modes, it makes all the unicode characters not exporting in the same way.

Chung
Toppan Vite
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Share
Documentation Survey: help us offer you better documentation! Translate