XML Stylesheet

I have a custom xml file type and have been asked to create a stylesheet to go with it so that the translators can preview as they work. My colleague and I read over Paul Filkin's blog about stylesheets and followed his advice to learn how to create one from the w3schools tutorials. It seems that, in order for Trados to read the stylesheet, it needs to contain the proper header indicating that it's an xml file. Here's where we have run into a problem: it seems that the xml files we are working with are not "true" xml. They start out as html files, which get converted to xml. As such, the xml file header is actually as follows:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

The custom file type I created knows the root element is html and has no trouble reading it as desired. Translators are able to work on it without issues. But as far as the stylesheet goes, we get an error if we try to use one with the xml header. We were able to create a stylesheet to test on the w3schools website when we removed that "html" root from our test file, which leads us to believe that the header is the culprit. Is there a way to create a stylesheet to preview these custom xml files?

Parents
  • Hi Beatriz,

    I think the problem is quite obvious:

    XSL (eXtensible Stylesheet Language) is a styling language for XML.
    XSLT stands for XSL Transformations.
    This tutorial will teach you how to use XSLT to transform XML documents into other formats (like transforming XML into HTML).

    (quoted from https://www.w3schools.com/xml/xsl_intro.asp, emphasis is mine)

    In the prolog, you declare that your file is HTML, not XML. Quite reasonable for an XSL parser to quit at that point I should think.

    Did you really remove the root? Or did you remove the prolog?

    <!DOCTYPE html> This is the prolog
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <html> is the root

    You could try to replace the <!DOCTYPE html> with <?xml version="1.0"?> or delete it altogether and see whether the files can still be imported by whatever it was that they were exported from. If it does cause problems when re-importing, you might have to add it to the translated file.

    Daniel

     

  • Thanks, Daniel! I'm rather new to this, so that was what I suspected, but it wasn't quite as obvious to me. :) Those are the files as I receive them from the team in my company that generates them. I don't manipulate them in any way and I simply copy/pasted the header into this post. After the prolog and root (thanks for those terms!), the file then continues to display content. Changing the prolog into xml does seem to work, but because the files are created automatically and there are many a day, that cannot be a long term solution. It sounds to me, from your answer, like I would have to go back to the team who creates the files and ask them if that can be changed, otherwise there is no other workaround. Thanks again.
  • Your file is apparently an HTML, not XML.
    Therefore, unless you have a specific reason beyond the knowledge of members of this forum, you should process the file using HTML file type, not XML file type.
    And therefore the request for creation of XSL for preview purposes is totally off...

    So from my perspective you seem to be going a completely wrong direction, therefore there is nothing to suggest here...
  • Thanks for the tip, Evzen. The file comes with an .xml extension and is being processed correctly on Trados with a custom xml file type. My understanding, from the engineers that created the files, is that they are converted from html to xml.
  • The point is that the DOCTYPE (and the root element) clearly state that the content is HTML.
    So the biggest question here is, whether the file is actually correctly created... i.e. whether the engineers actually know what they are doing (and do it correctly).
    Can you share a (anonymized, if needed) sample of the file? So that we can see the entire structure...

    I don't want to go too much into technical details, but technically it is of course possible to process HTML file as XML, since HTML is actually based on on XML principle, so to say...
    But the point is that one really has to PROPERLY UNDERSTAND the implications of doing so... otherwise it's just asking for troubles... or - and that's a way bigger issue - causing troubles to others :(
  • I'm not sure if I'm allowed to post the entire file structure even anonymized, but basically what follows is <head>, <title>, <body> tags, among others. It's a news story, so there's a headline tag, a byline, the story itself, contact information. My colleague informed me that her understanding (from speaking with the engineers who created the files) is that the file was originally an rsf and it goes through a series of conversions. There is a reasoning behind this and it has to do with establishing compatibility with our internal system. I don't know the full details. By the time it gets to us as a .xml, the first few lines look like this:

    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
    <meta charset="UTF-8"/>
    <meta property="Content-Type" content="application/xhtml+xml"/>
    <title>News Story</title>
    </head>

    Before it jumps into all the other structure. From what you and Daniel are explaining, though, it sounds to me like the xml stylesheet is just not designed for this. I did a quick test and it seems like Trados' HTML5 file type can also read this type. I would just have to add the parsing information. It looks like that file type also has its own preview features that I could test.

Reply
  • I'm not sure if I'm allowed to post the entire file structure even anonymized, but basically what follows is <head>, <title>, <body> tags, among others. It's a news story, so there's a headline tag, a byline, the story itself, contact information. My colleague informed me that her understanding (from speaking with the engineers who created the files) is that the file was originally an rsf and it goes through a series of conversions. There is a reasoning behind this and it has to do with establishing compatibility with our internal system. I don't know the full details. By the time it gets to us as a .xml, the first few lines look like this:

    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
    <meta charset="UTF-8"/>
    <meta property="Content-Type" content="application/xhtml+xml"/>
    <title>News Story</title>
    </head>

    Before it jumps into all the other structure. From what you and Daniel are explaining, though, it sounds to me like the xml stylesheet is just not designed for this. I did a quick test and it seems like Trados' HTML5 file type can also read this type. I would just have to add the parsing information. It looks like that file type also has its own preview features that I could test.

Children
  • Sorry, this description is useless... I meant to attach the real actual file.
    What you describe is a simple and pure HTML and there is absolutely no reason to process it using XML file type.
    In other words, so far I'm pretty positive that I was right with that completely wrong way. The root cause is probably an incorrect (or at least totally misleading) information you got from your engineers... and you believing in that information without checking the actual content/format of the file yourself.

    BTW, if the actual file contains also other tags than the ones defined in HTML5 specification (which is what your "I would just have to add the parsing information" statement suggests), then the file is clearly incorrect... since it states in the DOCTYPE element that it's an HTML5 file.