Announcing change of PDF conversion technology in Trados Cloud Offerings (Trados Studio's Cloud Capabilities, Trados Team, Trados Accelerate/Enterprise)

From Monday 13th March 2023 onwards, the Trados cloud platform (Studio cloud capabilities, Trados Team, Trados Accelerate/Enterprise) will start using a new mechanism and underlying technology to convert PDF files to translatable format in translation projects. Trados Studio also uses the new mechanism and underlying technology for processing PDF files in translation projects starting from cumulative update 6.

Our stance and recommendation is still to use the original file format wherever possible, PDF should always be seen as a workaround to the situation where the original file cannot be made available. PDF does not lend itself readily to localization and creates a lot of extra effort during the process.

This new technology provides PDF support similar to the previous vendor and, overall, you should get similar results when working with PDF files in the cloud or Trados Studio. However, the new PDF file type converts PDF project files into translatable format slightly differently, which can lead to the following differences when compared to previous Trados Studio versions:

  • Differences in analysis statistics.
  • Differences in TU lookup results due to changes in how PDF text formatting and segmentation is now handled.
  • Differences in how images are recovered when generating the translated PDF files.
  • Differences in how special characters and symbols are processed. If you use Asian languages or other non-Latin based languages as source languages, we recommend that you enable the new Use alternative processing (better for non-Latin based languages) option in the PDF file type settings.
  • Differences in PerfectMatch. If you ran PerfectMatch on a PDF file converted with the previous file type and now run PerfectMatch on a new bilingual file using the new file type, not all PerfectMatch segments may be transferred to the new file. To work around this, you can use the setting "Ignore formatting" in the PerfectMatch settings.

Conversion settings for new and existing projects

Your existing PDF file-type configuration will continue to work. However, the Conversion settings for new PDF-based projects will change as follows:

  • Layout - remains and existing setting will be remembered. 
  • Headers and footers - no longer available, will always be extracted now
  • Detect tables - no longer available, will always be extracted now
  • Image recovery - no longer available, images are kept, but no text is extracted
  • Recognize PDF text - no longer available, images are kept, but no text is extracted
  • New setting Use alternative processing (better for non-Latin based languages)

Support for scanned PDF documents

Support for scanned PDF documents using OCR (optical character recognition) is limited out of the box.

If a PDF file contains merely a scanned picture of the underlying document, then the new technology will not be able to convert the document. If, on the other hand, the document is scanned but the text in it is selectable, then the technology will attempt to convert the characters within the document.

You can test this in Adobe Reader, for example. If it's possible to select any text in the document, then the technology should be able to attempt to convert it.

The IRIS app is no longer supported as a complement for the PDF file type. We recommend that you use the new PDF Assistant for Trados Studio app for optimized support of scanned PDF documents regardless of source language (see below for details). 

Alternative approaches

If you need more advanced support for scanned PDF documents, we recommend the following options:

  • Install the new PDF Assistant for Trados Studio app. This is a new, free app, that we have developed especially for this change. It uses a new and sophisticated approach to PDF conversion and is available from within Trados Studio > Add-Ins tab >  RWS AppStore, and from the RWS AppStore website
    In this first release, PDF Assistant for Trados Studio uses Microsoft Word behind the scenes to perform PDF to DOCX conversion. It uses Word's rich capabilities to handle scanned documents and documents in a variety of languages, including bidirectional and Asian.
  • Use Microsoft Word’s built-in PDF. This accepts PDF files, including OCRed, for opening files and can save them out in Word .DOCX format which you can then process as usual.
  • Use Adobe Reader built-in function to save PDF documents in Microsoft Word format. This option can be purchased as a subscription. 
  • Check out third-party solutions, such as Abbyy Fine Reader or Readiris. These can convert OCRed PDF documents to Microsoft Word format. These solutions are available as perpetual licenses or on subscription.

What’s next?

We will keep updating and refining the new PDF Assistant for Trados Studio app to give you the best possible PDF conversion capabilities. We have developed this app with extensibility in mind, so in future updates, we may integrate other conversion providers into the app. For more information, see the PDF Assistant for Trados Wiki

Besides updating the app, we are committed to continuing to improving PDF support with future updates and are in constant touch with our new vendor around this. While we are transitioning to the new technology, we are keen to get your feedback around this change via the Trados Studio user community.

Trados Product Management

  • To be honest, I am quite disappointed by this change since it has removed a very useful feature for me from the PDF file type Coverter setting, respectively the "Recognize PDF text" options:

    old version

     

    new version: 

    The old feature has proven very useful with a document that was impossible to read with any other tool (except for Abbyy FineReader, of course), but to be honest not all of us can afford to also have a separate OCR tool or even an Acrobat Pro license for some "stubborn" pdf files. I am sorry but I see myself obliged to uninstall Studio 2022 CU6 and return to at least Studio 2022 CU5. Disappointed