Studio 2015 + OCR functionality

"Easily and quickly translate PDFs

It can often be frustrating receiving an image or PDF to translate but the new built in OCR reader in Studio 2015 will make translating PDFs and images much easier, even if they have been created from a scanned document."

"Save a little extra time: Translate scanned PDFs
Working on uneditable PDF documents can be frustrating and timeconsuming. Studio 2015 lets you translate any PDF fie in Studio even if it’s created from a scanned document. The new built-in OCR functionality extracts the text and converts into a translatable file."

...


This is what the SDL advertisement says. But the reality is really, really sad and poor. I tried to convert many German, Czech and Slovak PDF files and the results of the OCR recognition were absolutely unusable! Instead of characters with accents you get some nonsensical characters, there are nonsensical spaces in the middle of the words etc. etc.

Will SDL solve this problem? Did SDL even test this functionality with other languages than English? My colleagues said, the only language OCR works with is English. 

I really feel cheated, because the OCR functionality was one of the TOP features in SDL advertisements. By the way, my 15 years old Version of FineReader gives me better recognition results than Studio 2015. So it is still easier to go the old, complicated way, i.e. convert PDF into TIFF, run the OCR recognition TIFF with FineReader and save the recognized text as a Word file.

The reason why I purchased the Studio 2015 upgrade was this functionality - which does not work yet at all.

I hope I get an answer from SDL people here. I posted a similar text one month ago - with no reaction from SDL.

Parents
  • Hi Adrian,

    The Studio OCR functionality is based on the Solid Documents engine, which covers 14 languages - including German, but not Czech or Slovak. That explains your frustration with the last two languages.

    The settings at the bottom of the PDF file type are very important when you switch between editable and non-editable PDFs. Please check out a blog post I wrote, which explains these settings.

    In any case, if you work with a lot of scanned PDFs, an up to date version of FineReader would be the way to go. With FineReader, you can select the recognition language, include/exclude images, draw/edit  tables, add words/symbols to custom dictionaries, etc. 

    HTH,

    Emma

  • Hi Emma,
    thanks for your answer an the link - your blog article is very helpful.
    I tried all the different settings before, but nothing works properly, the best way is still my ancient version of FineReader.
    Your answer explains btw. something I should have known before I have purchased the Studio upgrade. The SDL marketing promises something the program can not perform. Where could I get the information about the 14 supported languages? Should I (as a end-user) know an be interested in, on which engine bases a partial functionality of a complex sw product? If the advertisement says "you can work with uneditable pdf files", you usually expect, that you really can work with those files. By the way, the recognition results of German PDF files are poor too, although this language should be supported. And I do not mean the resulting layout, graphics etc. - I mean just the result of a simple OCR text recognition.
    From here I think SDL should say loudly and clearly before you purchase the advertised product: We support only 14 languages - some of them in a debatable quality. Do not expect a wonder, the best way for you is to purchase a product, that really supports PDF.
    As a paying customer I can not agree with Paul, who says " there really isn't anything SDL need to resolve here"...
    In my opinion the SDL marketing promises something the SDL developers can not / do not want to realize.
    Best regards,
    Adrian
  • Unknown said:

    In my opinion the SDL marketing promises something the SDL developers can not / do not want to realize.

    Adrian

    Let me just clarify that the SDL developers are not part of the game here because the PDF module is a third-party product (Solid PDF Converter), for which SDL bought a license to be able to incorporate it into Studio. It has not been developed by SDL.

    Walter

  • Hi Adrian,

    The extreme OCR functionality, in any language even scanned, that you feel SDL Trados Studio 2015 should be capable of was never promised to we who beta tested the product, or we would have challenged it if we found the functionality to be less than expected.

    However, I think most of us understand that accessible text PDF is a hugely complex format coming from multiple sources containing front-end and background content such as images, user and web interactivity, fonts, etc. Portable Document Format was devised as a means of taking information across the gaps between incompatible operating systems or software formats but was not created specifically to be word processed, let alone translated. Then scanned PDF format, often basically an image, is completely different again.

    I would be totally amazed if any software as complex as Studio, designed for such a wide range of functionality, could do what you're hoping, if it wasn't specific OCR software. I've been using the software for many years and it has become something quite amazing over the years. It has so much functionality that makes our lives as translators easier and our work so much more efficient and competitive.

    All the best,
    Ali
  • Hi Walter,
    ok, I can understand, that SDL did not develop the PDF module. But in fact the PDF convertor is just a part of a very complex piece of software sold by SDL. And it was not the developer of Solid PDF Converter, who sent me dozens of emails with Studio 2015 ads... And a part of this self-praise messages was: Studio 2015 can work with uneditable PDF documents. DOT. That is the message to me - I am the end-user who is not able to investigate, who developed which part of Studio, and to be honest, I´m even not interested in. I just want to get the functionality which was merchandised.
    Regards,
    Adrian
  • Hi Ali,
    let me say first, that I am convinced that Studio (Trados in the past) has always been and is still the best tool for translators. That is fact.
    I can understand, that the software is still being developed and improved. But the developer of a software should not sell a product that does not work as merchandised!
    Unfortunately, not all translators are beta testers of Studio. That means, that we, normal people and end users, could not now, what SDL intended to sell - a OCR converter that can convert texts just in 14 languages, and even not properly.
    We, normal end users, just got dozens of advertising emails - and those emails (and the SDL web too) said, "Studio can convert uneditable PDF files into a translatable text". This is a information I got as a non-beta-tester, it is the only information I got from SDL (I could read nowhere, that my expectations should not be toooooo high, because ... (some excuses). I repeat again: My 15 years old version of FineReader supports 63 (!!!) languages. And the quality of recognition is very, very high - with "normal" or not very good scanned files aprox. 90-95% of usable text, when I get a file created digitally as a non-editable (picture) PDF, I almost do not need to edit the source word file. Even if the formatting of the source text is very complicated and complex, FineReader delivers an usable and translatable text at least I can work with (I can choose if I want to convert it as a Word file or just as plain text).
    After this experience with this 15 years old piece of software I really was able to believe the SDL advertisement, that Studio CAN WORK WITH PDF FILES, as promised.
    Let me repeat again please: not everybody is a beta-tester who is familiar with the intentions of the software developer. That means, that we, simple users, depend on the honesty of the SDL marketing...
    Regards,
    Adrian
Reply
  • Hi Ali,
    let me say first, that I am convinced that Studio (Trados in the past) has always been and is still the best tool for translators. That is fact.
    I can understand, that the software is still being developed and improved. But the developer of a software should not sell a product that does not work as merchandised!
    Unfortunately, not all translators are beta testers of Studio. That means, that we, normal people and end users, could not now, what SDL intended to sell - a OCR converter that can convert texts just in 14 languages, and even not properly.
    We, normal end users, just got dozens of advertising emails - and those emails (and the SDL web too) said, "Studio can convert uneditable PDF files into a translatable text". This is a information I got as a non-beta-tester, it is the only information I got from SDL (I could read nowhere, that my expectations should not be toooooo high, because ... (some excuses). I repeat again: My 15 years old version of FineReader supports 63 (!!!) languages. And the quality of recognition is very, very high - with "normal" or not very good scanned files aprox. 90-95% of usable text, when I get a file created digitally as a non-editable (picture) PDF, I almost do not need to edit the source word file. Even if the formatting of the source text is very complicated and complex, FineReader delivers an usable and translatable text at least I can work with (I can choose if I want to convert it as a Word file or just as plain text).
    After this experience with this 15 years old piece of software I really was able to believe the SDL advertisement, that Studio CAN WORK WITH PDF FILES, as promised.
    Let me repeat again please: not everybody is a beta-tester who is familiar with the intentions of the software developer. That means, that we, simple users, depend on the honesty of the SDL marketing...
    Regards,
    Adrian
Children
  • Unknown said:

    ... I am convinced that Studio (Trados in the past) has always been and is still the best tool for translators

    ... the developer of a software should not sell a product that does not work as merchandised!

    ... SDL intended to sell - a OCR converter that can convert texts just in 14 languages, and even not properly.

    ... advertising emails ... said, "Studio can convert uneditable PDF files into a translatable text".

    ... we, simple users, depend on the honesty of the SDL marketing...

    Hi Adrian,

    Taking the above points in order.

    I'm glad you have been able to benefit from the excellence of Studio and its predecessors, it really is a stonking piece of kit - I love it, in case you couldn't tell ;-) 

    Re your second point, it's not the developers who do the selling (but I'm 'splitting hairs' in pointing that out...)

    We can't really surmise what 'SDL' intended to sell because SDL is not a single entity, it's a huge composite of many departments (ditto)

    Indeed the ads did predict that Studio can convert uneditable PDFs into translatable text - it can, but not as well as dedicated OCR software, or in as many languages, or as some unfortunate users had hoped. I would imagine this functionality will continue to improve so that the occasions when a PDF is just too far removed from what Studio and the 3rd party OCR software it integrated is capable of handling become less frequent.

    Regarding depending on the honesty of marketing, I had a friend who was a programmer for an engineering control software house. His biggest complaint was that the sales people made promises of functionality before the programmers had even started work on it. He enjoyed complaining and was ignoring the fact of life that demand dictates progression. Here, of course, we have a more complex situation than that - 3rd party software integration. To program the interaction required between the two on top of everything else the amazing programmers of Studio have achieved, is a pretty tall order.

    All this being said, I am not arguing with you, just addressing the points you raised out of politeness and I totally understand your perspective and disappointment :)

    All the best,

    Ali