UTF-8 vs UTF-16 encoding in output from "DITA XML" output format publish

We have made the discovery that somewhere between LiveContent Architect (v11) and Tridion Docs 14, the output from the supplied "Publish DITA XML" output format changed from UTF-8 to UTF-16.

This broke programming that processes DITA XML for a customer website. The recriminations are flying!

Unfortunately, this change was not communicated. If it is in release notes or documentation, I missed it. The program was written is such a way where it requires UTF-8 and is not tolerating UTF-16 (primarily with whitespace and line separators). I don't know the details or if that was the only issue observed, but apparently the program and the website are not fully platform independent.

The encoding change also broke another post-process which my team owns, so we just wrote a workaround quickly which was easy to implement because it was in Linux where it is much easier for us to develop and run Perl programs (we just added a few encode statements to flip the content to UTF-8 right in the program doing the post-processing).

I opened a support ticket asking how to reconfigure the output to give us UTF-8. Was told this was not configurable. Was told it would have to be a PS engagement.

Which I will at least pursue to getting a quote. After that, don't know.

But here is my question! Does anyone here know of a relatively easy way to accomplish this? I don't want to have to introduce a separate post-process into this deliverable's workflow. It needs to be integrated/automatic. I know that with any DITA-OT plug-in I can just throw a small XSLT transform into the mix. But not sure if that's feasible with the TD14 provided "DITA XML" output. (I don't think that's in the OT, is it? Isn't it upstream and integrated into core TD14?)

Thanks,

Jay B.

emoji
Parents Reply Children