Studio 2015 + OCR functionality

"Easily and quickly translate PDFs

It can often be frustrating receiving an image or PDF to translate but the new built in OCR reader in Studio 2015 will make translating PDFs and images much easier, even if they have been created from a scanned document."

"Save a little extra time: Translate scanned PDFs
Working on uneditable PDF documents can be frustrating and timeconsuming. Studio 2015 lets you translate any PDF fie in Studio even if it’s created from a scanned document. The new built-in OCR functionality extracts the text and converts into a translatable file."

...


This is what the SDL advertisement says. But the reality is really, really sad and poor. I tried to convert many German, Czech and Slovak PDF files and the results of the OCR recognition were absolutely unusable! Instead of characters with accents you get some nonsensical characters, there are nonsensical spaces in the middle of the words etc. etc.

Will SDL solve this problem? Did SDL even test this functionality with other languages than English? My colleagues said, the only language OCR works with is English. 

I really feel cheated, because the OCR functionality was one of the TOP features in SDL advertisements. By the way, my 15 years old Version of FineReader gives me better recognition results than Studio 2015. So it is still easier to go the old, complicated way, i.e. convert PDF into TIFF, run the OCR recognition TIFF with FineReader and save the recognized text as a Word file.

The reason why I purchased the Studio 2015 upgrade was this functionality - which does not work yet at all.

I hope I get an answer from SDL people here. I posted a similar text one month ago - with no reaction from SDL.

Parents
  • Hi Adrian,

    The Studio OCR functionality is based on the Solid Documents engine, which covers 14 languages - including German, but not Czech or Slovak. That explains your frustration with the last two languages.

    The settings at the bottom of the PDF file type are very important when you switch between editable and non-editable PDFs. Please check out a blog post I wrote, which explains these settings.

    In any case, if you work with a lot of scanned PDFs, an up to date version of FineReader would be the way to go. With FineReader, you can select the recognition language, include/exclude images, draw/edit  tables, add words/symbols to custom dictionaries, etc. 

    HTH,

    Emma

  • Hi Emma,
    thanks for your answer an the link - your blog article is very helpful.
    I tried all the different settings before, but nothing works properly, the best way is still my ancient version of FineReader.
    Your answer explains btw. something I should have known before I have purchased the Studio upgrade. The SDL marketing promises something the program can not perform. Where could I get the information about the 14 supported languages? Should I (as a end-user) know an be interested in, on which engine bases a partial functionality of a complex sw product? If the advertisement says "you can work with uneditable pdf files", you usually expect, that you really can work with those files. By the way, the recognition results of German PDF files are poor too, although this language should be supported. And I do not mean the resulting layout, graphics etc. - I mean just the result of a simple OCR text recognition.
    From here I think SDL should say loudly and clearly before you purchase the advertised product: We support only 14 languages - some of them in a debatable quality. Do not expect a wonder, the best way for you is to purchase a product, that really supports PDF.
    As a paying customer I can not agree with Paul, who says " there really isn't anything SDL need to resolve here"...
    In my opinion the SDL marketing promises something the SDL developers can not / do not want to realize.
    Best regards,
    Adrian
  • Unknown said:

    In my opinion the SDL marketing promises something the SDL developers can not / do not want to realize.

    Adrian

    Let me just clarify that the SDL developers are not part of the game here because the PDF module is a third-party product (Solid PDF Converter), for which SDL bought a license to be able to incorporate it into Studio. It has not been developed by SDL.

    Walter

  • Hi Adrian,

    The extreme OCR functionality, in any language even scanned, that you feel SDL Trados Studio 2015 should be capable of was never promised to we who beta tested the product, or we would have challenged it if we found the functionality to be less than expected.

    However, I think most of us understand that accessible text PDF is a hugely complex format coming from multiple sources containing front-end and background content such as images, user and web interactivity, fonts, etc. Portable Document Format was devised as a means of taking information across the gaps between incompatible operating systems or software formats but was not created specifically to be word processed, let alone translated. Then scanned PDF format, often basically an image, is completely different again.

    I would be totally amazed if any software as complex as Studio, designed for such a wide range of functionality, could do what you're hoping, if it wasn't specific OCR software. I've been using the software for many years and it has become something quite amazing over the years. It has so much functionality that makes our lives as translators easier and our work so much more efficient and competitive.

    All the best,
    Ali
  • Hi Walter,
    ok, I can understand, that SDL did not develop the PDF module. But in fact the PDF convertor is just a part of a very complex piece of software sold by SDL. And it was not the developer of Solid PDF Converter, who sent me dozens of emails with Studio 2015 ads... And a part of this self-praise messages was: Studio 2015 can work with uneditable PDF documents. DOT. That is the message to me - I am the end-user who is not able to investigate, who developed which part of Studio, and to be honest, I´m even not interested in. I just want to get the functionality which was merchandised.
    Regards,
    Adrian
Reply
  • Hi Walter,
    ok, I can understand, that SDL did not develop the PDF module. But in fact the PDF convertor is just a part of a very complex piece of software sold by SDL. And it was not the developer of Solid PDF Converter, who sent me dozens of emails with Studio 2015 ads... And a part of this self-praise messages was: Studio 2015 can work with uneditable PDF documents. DOT. That is the message to me - I am the end-user who is not able to investigate, who developed which part of Studio, and to be honest, I´m even not interested in. I just want to get the functionality which was merchandised.
    Regards,
    Adrian
Children
No Data