From Monday 13th March 2023 onwards, with a new CU8.1 (184.108.40.20680) release for Trados GroupShare 2020, it uses a new mechanism and underlying technology to convert PDF files to translatable format in translation projects.
Our stance and recommendation is still to use the original file format as possible, PDF should always be seen as a workaround to the situation where the original file cannot be made available. PDF does not lend itself to localization and creates a lot of extra effort during the process.
Your existing PDF File-type configuration will continue to work, however settings will change as follows:
- Layout – remains and existing setting will be remembered
- Headers and footers – no longer available, will always be extracted now
- Detect tables – no longer available, will always be extracted now
- Image recovery – no longer available, images are kept, but no text is extracted
- Recognize PDF text – no longer available, images are kept, but no text is extracted
- New setting Use alternative processing (better for non-Latin based languages)
While this new technology provides similar support as the previous one, and you should get similar results overall, we would like to point out a few limitations and potential solutions:
- If you use Asian languages or other non-Latin based languages as source languages, we recommend that you tick the new checkbox Use alternative processing (better for non-Latin based languages).
- Support for scanned PDF documents using OCR (optical character recognition) is limited out of the box. If a PDF file contains merely a scanned picture of the underlying document, then the new technology will not be able to convert the document. If, on the other hand, the document is scanned but the text in it is selectable, then the technology will attempt to convert the characters within the document. You can test this in Adobe Reader, for example. If it's possible to select any text in the document, then the technology should be able to attempt to convert it.
- If you need more advanced support for scanned PDF documents, we recommend the following options:
- If you use Microsoft Word, you can use its built-in PDF conversion – it accepts PDF files, including OCRed, for opening files and can save them out in Word .docx format which you can then process as normal.
- Adobe Reader also has a built-in function to save PDF documents in Microsoft Word format, which can be purchased as a subscription.
- Alternatively, consider purchasing a third party solution, such as Abbyy Fine Reader or Readiris, that can convert OCR'ed PDF documents to Microsoft Word format. These options are available as perpetual licenses or on subscription.
While we are transitioning to the new technology, we are keen to get your feedback around this change, so we can continue improving PDF support in future where possible. We are in constant touch with our new vendor around this. So – feel free to get in touch via our community!
Trados Product Management