Output file Error (XML parsing error)

Hi,

It's the second time that I'm facing this error. I sent a package to a translator (there was nothing wrong with the project so far). The translator did his job and send me back the return package. The problem is that the output file couldn't open in Microsoft Word (xml parsing error). So, after searching the net, I couldn't find anything, and made the project again and pretranslated it. The project had lots of segments needed to merge, and the translator merged all that segments (I think this problem is with merging the segments with tags). So, a lot of segments hadn't translation, and I had to put them manually into the word document. It's so frustrating, so please debug it or show me a workaround.

 

Thank you.

Parents Reply Children
  • Hi ,

    Whilst we do say we support PDF files these are a pretty tricky format and in my opinion a little bit of work should be carried out before preparing the files to make them more likely to survive a translation round trip. In this case there has been no attempt to clean up a fairly large PDF for translation at all, so the translator has left out tags all over the place, got some tags the wrong way around and had to merge to deal with poor segmentation issues. Leaving tags out, at least formatting tags, probably won't cause the problem you have but having tags the wrong way around could and so could all the merging.

    I was going to have a go at fixing the word file but the underlying xml is so large I can't even open it! So I think your best bet with files of this nature is to convert them to Word first, and then clean them up:

    - remove necessary formatting tags
    - remove incorrect hard breaks

    Then prepare your project with the Word filetype instead. This is probably going to be a lot easier for the translator as this file was a little bit like a tag soup, so they'll thank you for that, and it's more likely to survive the sort of file manipulation this one underwent, although if you did this clean up it probably would not need it.

    In the meantime I don't have a better solution that the one you have applied. Just additional work to re-translate using Perfect Match and some TM matching.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,
    Thank you so much. It was one of my first packages, so I didn't work on the source file. Recently, I edit the PDF file, then import it in Trados Studio. I will try your method to see which one is faster and more efficient.

    Kind regards,
    Alireza
  • Unknown said:
    Whilst we do say we support PDF files

    I think that from a certain perspective this was not a good move at all... since it just makes the laic majority of users - and what's even worse, clients - believe that PDF is just a 'normal' format like Word or so... and makes more and more difficult for us, the knowledgable minority, to explain to them that primary way is to translate the SOURCE format from which the PDF was created, not the PDF itself.

    I suppose the main reason for implementing this "internal conversion workaround" (internal, because the details of the internal process are rather hidden from the user) was something like "I need the CONTENT translated a do not care much - or at all - about the LAYOUT"...
    Unfortunately, most of the clients I deal with actually DO care about the LAYOUT, while they do not understand the problem of PDF being originally designed as read-only format and thus being totally unsuitable for standard localization process.

    So, I live in a hope that the issues with PDF support will outweigh the benefits one day and that the support will eventually be removed... in favor of clear and strong "translate teh SOURCE, NOT the PDF!" message to all people in the world ;-).

  • Unknown said:
    In this case there has been no attempt to clean up a fairly large PDF for translation at all, so the translator has left out tags all over the place, got some tags the wrong way around and had to merge to deal with poor segmentation issues.

    Haha, welcome to my daily reality... This is apparently EXACTLY what I get basically every day...
    Sorry for off-topic, but this is precisely the case why I say "never trust the translator" in the other thread.

  • Unknown said:
    Sorry for off-topic, but this is precisely the case why I say "never trust the translator" in the other thread.

    I wouldn't blame the translator for this at all.  He/she received the package with an unprepared file which was very difficult to work with.  I think this is just a format that needs to be handled a different way in preparing the file for translation.  If you don't do that then every stage in the translation process becomes problematic and each person has to deal with problems as a result.

    Unknown said:
    So, I live in a hope that the issues with PDF support will outweigh the benefits one day and that the support will eventually be removed... in favor of clear and strong "translate the SOURCE, NOT the PDF!" message to all people in the world ;-)

    I think you just live in the world of volume translations where this sort of process is completely unacceptable.  Support for PDFs has always been one of the most called for requirements in many CAT tools, hence we see it in quite a few.  Often the source is not available at all and people want a way to handle these files without having to go through the process of recreating the source due to the additional time/costs involved in doing this.  So I live in hope that PDF support will get better and become capable of removing many of the issues we see today completely.

    I think education is key, and certainly PDF handling is a common theme at translation conferences these days so people are more aware today than they have been.  It can only get better.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Sorry for a bit of OT, but...

    I think education is key, and certainly PDF handling is a common theme at translation conferences these days so people are more aware today than they have been.  It can only get better.

    I am translator myself and I would always prefer the editable source over the PDF. But when the editable source is for example LaTeX or QuarkXPrss or Quciksilver, ith won't really be helpful, if I do not have the necessary source tool. Of course I can ask the customer for an export, but with LaTeX for example they say they can export only PDF. We have such a customer, who is willing to give us LaTeX, but the file structure is a nightmare. Something maybe I can deal with, but 99% of the translators will not be able to. So this is one part of the story.

    Other part is, that one of our customers does produce machines. He uses many parts from other vendors, like a Siemens engine or Flender gearbox. For these he has manuals, but those are available only in PDF. And I cannot request these manuals from that customer in any other format - he will not be able to provide such, as Siemens/Flender & co. will not give such materials.

    So dealing with PDF is a crucial issue for any translator. For the time being however I never rely on automated conversion, as the results are poor. If the PDF is being converted for my purposes and thus just for one language, I however do not do many preparation work, as I can do the same on the resulting file and save much time, as this means editing only on one end. Would the file be prepared for translation in several languages, preparation is absolutely necessary, as this is done once before the process starts and hopefully - if done well - really only once. A test before sending packages out is also not a bad idea.

    _________________________________________________________

    When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

    Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

  • Unknown said:
    Other part is, that one of our customers does produce machines. He uses many parts from other vendors, like a Siemens engine or Flender gearbox. For these he has manuals, but those are available only in PDF. And I cannot request these manuals from that customer in any other format - he will not be able to provide such, as Siemens/Flender & co. will not give such materials.

    Well, this is where the education should step in... but on a completely different level than some translation conferences mentioned by Paul. Translation conferences are attended by people either coming from the translation industry, or at least interested in it... so they are at least partially aware of the problem.... But these Siemens-/Flender- and other people are totally clueless :( and that's where the education MUST come!

    How come that they "will not give such materials"?! If they will not, then others using their parts (i.e. PAYING THEM MONEY!) won't be able to do their job properly (i.e. including translated manuals, etc.), i.e. in effect WON'T PAY THEM THE MONEY... as simple as that. So it's IN THEIR BEST INTEREST to provide appropriate materials.
    And if the translators won't be CONSTANTLY asking and PUSHING for source formats and constantly reminiding that PDF is evil, it will never get better... because everybody will believe that it's okay as-is :(

  • Unknown said:
    I wouldn't blame the translator for this at all.  He/she received the package with an unprepared file which was very difficult to work with.

    Sure. But that doesn't give the translator ANY right to cripple the delivery by removing and/or screwing up the tags. And that was my point.

    The file could have been unprepared for various reasons, from totally incapable engineer (remember, if you pay peanuts...) to incapable client providing already-prepared file/package (not uncommon in today's world of outsourcing and sub-contracting on multiple levels).
    No matter why it happened, translator MUST deliver 'the same format which he received' (unless arranged otherwise).

  • In fact yes, translators are requested to deliver what they've got, unless something else has been agreed.

    But if the PM sends file as the one in picture, I would say send the PM to Mars... And it cannot be requested from the translator to work with such file. The only solution I see here is the translator refusing to take such job.

    _________________________________________________________

    When asking for help here, please be as accurate as possible. Please always remember to give the exact version of product used and all possible error messages received. The better you describe your problem, the better help you will get.

    Want to learn more about Trados Studio? Visit the Community Hub. Have a good idea to make Trados Studio better? Publish it here.

  • Unknown said:
    But if the PM sends file as the one in picture, I would say send the PM to Mars... And it cannot be requested from the translator to work with such file. The only solution I see here is the translator refusing to take such job.

    Yup, fully agree... that's part of the "unless arranged otherwise" ;-)