PDF conversion - double spaces

Screenshot of Trados Studio showing distorted text after converting a PDF, with overlapping lines and unclear formatting.

 

After converting a PDF, a problem as above frequently comes up. I've seen it with numerous PDF files - a majority of them, in fact. It's a major disappointment and a really lame bug, considering that the Solid technology has been in Trados since I don't know how many years, and I've seen mentions of this on the web (Proz) from around 2010.

It wouldn't be a problem anyway if it didn't distort DeepL's pretranslation, but it does.

The soft is Trados 2019, latest version obviously.

Yes, I know I can use alternative PDF conversion methods. Or actually I can't, because often Trados won't process Acrobat-converted PDF-s due to some tag error when trying to DeepL-ize them. So it's jumping between faults and errors with this 500 euros software these days.

Oh, and I'm talking about editable PDFs, naturally - not optical scans.



Generated Image Alt-Text
[edited by: Trados AI at 2:09 PM (GMT 0) on 28 Feb 2024]
emoji
  • Hi 

    Very annoying I know. I don't know how PDFs are compiled but I've found they are often 'layered' (for want of a better term) and the 'text' isn't always how it appears if you search it or copy it. For example, this morning I copied '-15C' in the sdlxliff to search for its context in the source PDF that came with the job and the search found various random words in the PDF, as here:

    Screenshot of Trados Studio's 'Find' function with the search term '12.5C' entered, showing 'Previous' and 'Next' buttons, and a snippet of the PDF with 'HEAT PUMP' and menu options visible.

    Conversely I've copied text in a PDF to search the SDLXLIFF and found that rather than 'HEAT PUMP', that which appears in the search box when I paste what I've highlighted and copied is for example 'HHEATT  PUMMP'.

    All I can say is that when I find a 'text-based' PDF doesn't convert tidily to SDLXLIFF, I save from the SDLXLIFF back to Word via File>Advanced Save>Save as Source then I tidy up the resulting source Word file then create a new SDLXLIFF based on that.

    So, for example, use File>Advanced Save>Save as Source to create a source Word file, do a batch replace of 2 spaces with 1 space then repeat till there are none left then translate the resulting file. 

    Just a workaround I know but may help while you're waiting for a reply from someone who understands more about this than I.

    All the best

    Alison

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 2:09 PM (GMT 0) on 28 Feb 2024]
  • Hi, and thanks for the suggestion.

    Sure, that is a solution, though a major workflow-retarder. We shouldn't have to resort to this kind of 1990s-style manipulations with this expensive product. Especially given how old-aged this bug appears to be.
  • Hi Adrian,

    You're welcome. I would say that the fact that PDFs can still cause conversion problems is due to just how many differently-compiled software sources a PDF can come from, and by the varied possible settings used when creating those PDFs, via Print, eBook, Publish, Save As and all the possible individual settings within them - plus how the fonts are added, if at all. I used to work in desktop publishing with a piece of software called Interleaf Quicksilver and you wouldn't believe how many methods & settings there were for creating PDFs.

    I'm always amazed how accurately Studio does import PDFs as it isn't OCR software first and foremost.

    I remember the 1990's and the fun of the lateral thinking required to achieve workarounds when software combinations didn't just work straight out of the box. When even Word didn't have multilingual proofing tools. I remember the wonderful day when my Word 6.0 French grammar and spellchecking tool arrived by post. Glory be when Word 95 came out and the tool was already incorporated. Good old bad old days!

    Anyway, have a good day and an excellent week!

    Ali :)