PDF Issue: Layout of Translated File & Recognition of Source Text

Question

Hi all, 
 I am trying to translate an original (editable) PDF on Trados from English into Greek. However, two major issues arise. First, many of the characters of the source text are not recognized properly in Trados when the file is imported. As a result, the source text on my Editor appears as gibberish in many segments. For example, the character x is replaced with - while w is replaced with x . 
 
 An example of bad text recognition 
 Second, when I generate the translation, the translated file is totally ruined in terms of layout. For example, much of the text of the 1st page has been moved to the 2nd page, the segmentation has been ruined, and the spacing is terrible (see images below). 
 
 The layout of the source text 
 
 The layout of the target file (1st page) 
 
 The layout of the target file (2nd page) 
 
 In general, I'm always facing layout issues when it comes to translating PDF files. What would be your suggestion on preventing these issues, if possible? 
 
 Kind regards, 
 Christos

Evzen Polenka · Answer

The suggestion is simple - translate the source document from which the PDF was created , not the PDF. 
 Unfortunately, clients are clueless and do not know that PDF is NOT "just another document format" which can be freely edited. And it's even more unfortunate that many (majority?) translators do not know this either... 
 PDF was invented as consume-only (i.e. read-only, print-only) format and was NEVER intended to be editable or otherwise processable. 
 Therefore, if client wants to have a PDF localized, the only correct process is to localize the original format used to create the PDF (which can be Word, InDesign, Quark, or whatever else), NOT the PDF. Period.

Jerzy Czopik · Answer

I must disappoint you - there is no way to get it better, if you do not invest any work. And you have to invest this work upfront. Even if it is possible to translate a PDF with a CAT tool directly, this is a very bad idea. You have already learned why - the conversion is as it is. You have no influence on what is being converted how. 
 Either insist on translating the native format of the document before it was PDF or use a decent OCR and convert the PDF manually. Then pay attention, that the fonts used do cover your target language. Expecting any automated tool to be able to provide you perfect conversion quality is - forgive my French - at least naive. 
 If you want to learn more about Studio and PDF, watch this upcoming webinar: http://seminare.bdue.de/4705 
 BTW, the term "editable" PDF is very misleading. No PDF is "editable", as the format has been entirely developed for READ-ONLY applications. It is not intended to be edited in any way. So what you mean is a "clickable" PDF, where you can click and select text. If the PDF is not protected, you can also copy the text. The best idea would be simply to copy all the text into a notepad to remove all formatting, translate this, apply the basic formatting like headings, list elements and so on and deliver this to the customer to make his layouter copy & paste it, if the customer will not deliver the original source file. Or to reformat it yourself.

Paul Filkin · Answer

I'm afraid the advice you're getting s probably the right advice. PDF files can be notoriously difficult to manage and even if the conversion to Word for the translatable file goes well the layout is easily lost if it's complex as Word isn't the best tool for things like this. 
 What software did you use for the original file... maybe there is a way to get at the content another way? 
 You could also try using IRIS. Make sure you have the IRIS plugin installed and activated before you create your project. It "might" help. 
 https://multifarious.filkin.com/2017/08/17/iris-ocr/

Trados Studio > 1. Trados Studio

PDF Issue: Layout of Translated File & Recognition of Source Text

Top Replies