CONVERTER from PDF to HTML

Could you give me the name of the best Converter from PDF to HTML which Trados can read. PLEEEEASEE.Thanks

emoji
Parents
  •  

    If you want to translate the content of the PDF in Studio, you should convert it to Word and not to HTML. Studio will do this out of the box, simply open the PDF and it will be converted to WORD/SDLXLIFF, which you can translate in Studio. The target file will be in Word format (DOCX).

    Alternatively, you may use a PDF converter such as Abby Fine Reader or Nitro Pro to convert your PDF to Word and then open this converted file in Studio.


    emoji
  • Hi Walter, 


    I have used the PDF Assistant for Trados (plugin 197) to convert a PDF into Word to avoid having so many tags. Now, I see in the preview that the text in boxes around images hasn't been translated. 
    Can you tell me how I could to have this text recognised by Studio and displayed on the editor, please? 

    Many thanks for your help

    Marta 

    emoji
  •   ,

    You may be lucky if the converted text matches exactly what you see in the PDF.

    Here are a few ideas:

    - There are many PDF converters and paid ones (some mentioned by Walter) usually perform better than the free converters. They also have more features, such as OCR.

    - Translating PDFs is not the best option. PDFs are very good for being shared, but not good to be edited, as there is always some information lost (formatting, resolution…) in the conversion. For this reason, I always ask the client for the source document, and then I translate that source document.

    - If I need to work with a PDF, I never work directly with the out-of-the-box converter output (such as the one you mentioned from Trados). Instead, I clean the text in Word, add missing text, delete duplicated/irrelevant text, delete returns, remove funny characters (such as □), remove ligatures (such as in the word fi nger), remove the tag soup (CTRL+ SPACE). Then and only then, I open this Word document in Trados Studio. I almost forget to add that text from non-editable images is manually extracted and saved to a separate document, so it can be inserted at later stage. As you can see, a lot of things to be done with the free route (I don’t know how much effort you can save with a paid converter). And of course, I let the client know that the formatting won’t be identical to the PDF and that I will happily charge for this extra work. Sometimes, magic works, and the client finds the source document and the non-editable images!

    emoji
  • Jesús, 
    Thanks for your answer. 

    I spoke with a Trados representative recently about an issue I had with tags, and he suggested to use the Pdf Assistant for Trados. Of course, my question was what was the best way to deal with many tags, and only now I see the text around images is not recognised.

    I work for a company and have access to the source text of the documents I translate. The pdf's are generated by a software called Contentful. We know about the Contentful integration with Trados, have requested a demo, and are considering working with it. I guess, when we have it, we won't have formatting problems, so, in the meantime, it looks like my best option would be to follow your 3rd suggestion. 

    Thanks for all the details you have given on how to do that. I don't have much experience yet preparing documents for translation, and it is very useful. 
    Marta

    emoji
  •  

    I guess you have already weighted the obvious way: translate the TXT files you have (you may need to invest on a one-off parsing step in order to extract only the content to be translated), and then create the PDF from this translated TXT as usual with Contentful.

    Another way is Infix (https://www.iceni.com/infix.htm). It creates an XML which is translated in Trados and then imported to the translated XML into the PDF. You may need some pre-processing as well in each PDF, but I’d bet that it’d be lighter than the Word way. There is a trial option.

    emoji
Reply Children