OCR not reading text in PDF

Hello. I am translating a PDF document in English and the OCR is not capturing some of the text. How do I include the English text in the Editor to be translated? Or should I add it in target document? 

Parents
  • Hi Talia,

    How did you OCR the file?  If you're relying on Studio 2015 to do this then the best approach is this:

    1. Open the PDF in Studio
    2. Save the target file
    3. You now have a DOCX that you should open in Word and tidy up.  So remove excess tagging, ad in any missing text that wasn't picked up in the OCR etc.
    4. Tranlsate the DOCX

    If you used another tool for the OCR then do the same thing, but tidy up the document produced from whatever tool you used.  You can find a useful article here from  on working with poorly formatted PDFs:

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul. Yes, I am relying on Studio 2015. Please clarify the steps:
    1. Open the PDF in Studio - OK
    2. Save the target file - Save it as "save as" Does it matter where? Can I save it in my desktop? And it will be save as DOCX right?
    3. You now have a DOCX that you should open in Word and tidy up. So remove excess tagging, ad in any missing text that wasn't picked up in the OCR etc. -- OK
    4. Tranlsate the DOCX - That means I will start my translation all over again as a new project ?

    Thanks for clarifying. It is my first time using DSL.
  • You got it. If the PDF is a nicely formatted one, maybe created directly from an electronic file, then Studio can often handle the PDF directly. But some PDF files are not very good quality, so here it's often easier to take the converted DOCX as soon as possible and fix that for translation. Will save you a lot of messing around in the long run.

    Yes, you can save the DOCX anywhere you like, probably makes sense to put it into the same folder as your PDF. You might even find one already there as Studio converts the PDF to DOCX to work on it.

    And yes, you start your translation all over again as a new project. If you've already done some work on the PDF you should be able to pretranslate the DOCX from your TM. Might not be perfect but it'll get you pretty close.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Reply
  • You got it. If the PDF is a nicely formatted one, maybe created directly from an electronic file, then Studio can often handle the PDF directly. But some PDF files are not very good quality, so here it's often easier to take the converted DOCX as soon as possible and fix that for translation. Will save you a lot of messing around in the long run.

    Yes, you can save the DOCX anywhere you like, probably makes sense to put it into the same folder as your PDF. You might even find one already there as Studio converts the PDF to DOCX to work on it.

    And yes, you start your translation all over again as a new project. If you've already done some work on the PDF you should be able to pretranslate the DOCX from your TM. Might not be perfect but it'll get you pretty close.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Children