XML Stylesheet

I have a custom xml file type and have been asked to create a stylesheet to go with it so that the translators can preview as they work. My colleague and I read over Paul Filkin's blog about stylesheets and followed his advice to learn how to create one from the w3schools tutorials. It seems that, in order for Trados to read the stylesheet, it needs to contain the proper header indicating that it's an xml file. Here's where we have run into a problem: it seems that the xml files we are working with are not "true" xml. They start out as html files, which get converted to xml. As such, the xml file header is actually as follows:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

The custom file type I created knows the root element is html and has no trouble reading it as desired. Translators are able to work on it without issues. But as far as the stylesheet goes, we get an error if we try to use one with the xml header. We were able to create a stylesheet to test on the w3schools website when we removed that "html" root from our test file, which leads us to believe that the header is the culprit. Is there a way to create a stylesheet to preview these custom xml files?

Parents Reply Children
  • The point is that the DOCTYPE (and the root element) clearly state that the content is HTML.
    So the biggest question here is, whether the file is actually correctly created... i.e. whether the engineers actually know what they are doing (and do it correctly).
    Can you share a (anonymized, if needed) sample of the file? So that we can see the entire structure...

    I don't want to go too much into technical details, but technically it is of course possible to process HTML file as XML, since HTML is actually based on on XML principle, so to say...
    But the point is that one really has to PROPERLY UNDERSTAND the implications of doing so... otherwise it's just asking for troubles... or - and that's a way bigger issue - causing troubles to others :(
  • I'm not sure if I'm allowed to post the entire file structure even anonymized, but basically what follows is <head>, <title>, <body> tags, among others. It's a news story, so there's a headline tag, a byline, the story itself, contact information. My colleague informed me that her understanding (from speaking with the engineers who created the files) is that the file was originally an rsf and it goes through a series of conversions. There is a reasoning behind this and it has to do with establishing compatibility with our internal system. I don't know the full details. By the time it gets to us as a .xml, the first few lines look like this:

    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
    <meta charset="UTF-8"/>
    <meta property="Content-Type" content="application/xhtml+xml"/>
    <title>News Story</title>
    </head>

    Before it jumps into all the other structure. From what you and Daniel are explaining, though, it sounds to me like the xml stylesheet is just not designed for this. I did a quick test and it seems like Trados' HTML5 file type can also read this type. I would just have to add the parsing information. It looks like that file type also has its own preview features that I could test.

  • Sorry, this description is useless... I meant to attach the real actual file.
    What you describe is a simple and pure HTML and there is absolutely no reason to process it using XML file type.
    In other words, so far I'm pretty positive that I was right with that completely wrong way. The root cause is probably an incorrect (or at least totally misleading) information you got from your engineers... and you believing in that information without checking the actual content/format of the file yourself.

    BTW, if the actual file contains also other tags than the ones defined in HTML5 specification (which is what your "I would just have to add the parsing information" statement suggests), then the file is clearly incorrect... since it states in the DOCTYPE element that it's an HTML5 file.