PDF conversion/import renders "ti" and "fi" characters incorrectly (ligature problem)

Hi all, 

For about a year now, certain character combinations are omitted or converted to ligatures when opening an editable PDF file in Word. I understand it's a font problem, so I've tried changing the "Ligatures" setting in "OpenType Features" in Word, but nothing has worked.

After reading Nichola Knusten's recent forum post on ligatures, I realise the same problem occurs when opening editable pdf files in Trados, hence this new post!

Specifically, "ti" is omitted and "fi" becomes a single character. See the words notificación and certificado in this screenshot:

  

I'm attaching a sample pdf file, in the hope that someone can help me solve the problem either in the Word conversion or in Trados.

PDF


emoji
Parents
  • Hi  , if you're referring to the "Ligatures" setting in "OpenType Features" in Word, then, yes I've tried that and no, it doesn't work. Blush

    emoji
  • Dear all, 

    We encountered that issue 4 years ago at a time when we had to convert a great deal of PDF publications in order to align them and feed our TMs.

    Back then, we "simply" wrote a macro and ran it on the Word rendering of the PDF conversion in order to get a correct file before importing it into Studio.

    Here's what the macro looked like. I'm sure you can adapt it to your specific ligature issues if they are different than the ones we encountered (we only had trouble with f+, not with t+). FYI, we used the Unicode decimal code to find the latin small ligature characters and then replace them with the proper characters.

    Gwen

    Macro permettant de remplacer automatiquement les caractères accolés ff, fi, fl, ffi et ffl avec les caractères classiques ff, fi, fl, ffi et ffl.

     

    Sub Caractères_accolés()
    
    '
    
    ' Caractères_accolés Macro
    
    '
    
    '
    
        Selection.Find.ClearFormatting
    
        Selection.Find.Replacement.ClearFormatting
    
        With Selection.Find
    
            .Text = "^u64256"
    
            .Replacement.Text = "ff"
    
            .Forward = True
    
            .Wrap = wdFindContinue
    
            .Format = False
    
            .MatchCase = False
    
            .MatchWholeWord = False
    
            .MatchWildcards = False
    
            .MatchSoundsLike = False
    
            .MatchAllWordForms = False
    
        End With
    
        Selection.Find.Execute Replace:=wdReplaceAll
    
        With Selection.Find
    
            .Text = "^u64258"
    
            .Replacement.Text = "fl"
    
            .Forward = True
    
            .Wrap = wdFindContinue
    
            .Format = False
    
            .MatchCase = False
    
            .MatchWholeWord = False
    
            .MatchWildcards = False
    
            .MatchSoundsLike = False
    
            .MatchAllWordForms = False
    
        End With
    
        Selection.Find.Execute Replace:=wdReplaceAll
    
        With Selection.Find
    
            .Text = "^u64257"
    
            .Replacement.Text = "fi"
    
            .Forward = True
    
            .Wrap = wdFindContinue
    
            .Format = False
    
            .MatchCase = False
    
            .MatchWholeWord = False
    
            .MatchWildcards = False
    
            .MatchSoundsLike = False
    
            .MatchAllWordForms = False
    
        End With
    
        Selection.Find.Execute Replace:=wdReplaceAll
    
        With Selection.Find
    
            .Text = "^u64259"
    
            .Replacement.Text = "ffi"
    
            .Forward = True
    
            .Wrap = wdFindContinue
    
            .Format = False
    
            .MatchCase = False
    
            .MatchWholeWord = False
    
            .MatchWildcards = False
    
            .MatchSoundsLike = False
    
            .MatchAllWordForms = False
    
        End With
    
        Selection.Find.Execute Replace:=wdReplaceAll
    
        With Selection.Find
    
            .Text = "^u64260"
    
            .Replacement.Text = "ffl"
    
            .Forward = True
    
            .Wrap = wdFindContinue
    
            .Format = False
    
            .MatchCase = False
    
            .MatchWholeWord = False
    
            .MatchWildcards = False
    
            .MatchSoundsLike = False
    
            .MatchAllWordForms = False
    
        End With
    
        Selection.Find.Execute Replace:=wdReplaceAll
    
    End Sub

    emoji
Reply
  • Dear all, 

    We encountered that issue 4 years ago at a time when we had to convert a great deal of PDF publications in order to align them and feed our TMs.

    Back then, we "simply" wrote a macro and ran it on the Word rendering of the PDF conversion in order to get a correct file before importing it into Studio.

    Here's what the macro looked like. I'm sure you can adapt it to your specific ligature issues if they are different than the ones we encountered (we only had trouble with f+, not with t+). FYI, we used the Unicode decimal code to find the latin small ligature characters and then replace them with the proper characters.

    Gwen

    Macro permettant de remplacer automatiquement les caractères accolés ff, fi, fl, ffi et ffl avec les caractères classiques ff, fi, fl, ffi et ffl.

     

    Sub Caractères_accolés()
    
    '
    
    ' Caractères_accolés Macro
    
    '
    
    '
    
        Selection.Find.ClearFormatting
    
        Selection.Find.Replacement.ClearFormatting
    
        With Selection.Find
    
            .Text = "^u64256"
    
            .Replacement.Text = "ff"
    
            .Forward = True
    
            .Wrap = wdFindContinue
    
            .Format = False
    
            .MatchCase = False
    
            .MatchWholeWord = False
    
            .MatchWildcards = False
    
            .MatchSoundsLike = False
    
            .MatchAllWordForms = False
    
        End With
    
        Selection.Find.Execute Replace:=wdReplaceAll
    
        With Selection.Find
    
            .Text = "^u64258"
    
            .Replacement.Text = "fl"
    
            .Forward = True
    
            .Wrap = wdFindContinue
    
            .Format = False
    
            .MatchCase = False
    
            .MatchWholeWord = False
    
            .MatchWildcards = False
    
            .MatchSoundsLike = False
    
            .MatchAllWordForms = False
    
        End With
    
        Selection.Find.Execute Replace:=wdReplaceAll
    
        With Selection.Find
    
            .Text = "^u64257"
    
            .Replacement.Text = "fi"
    
            .Forward = True
    
            .Wrap = wdFindContinue
    
            .Format = False
    
            .MatchCase = False
    
            .MatchWholeWord = False
    
            .MatchWildcards = False
    
            .MatchSoundsLike = False
    
            .MatchAllWordForms = False
    
        End With
    
        Selection.Find.Execute Replace:=wdReplaceAll
    
        With Selection.Find
    
            .Text = "^u64259"
    
            .Replacement.Text = "ffi"
    
            .Forward = True
    
            .Wrap = wdFindContinue
    
            .Format = False
    
            .MatchCase = False
    
            .MatchWholeWord = False
    
            .MatchWildcards = False
    
            .MatchSoundsLike = False
    
            .MatchAllWordForms = False
    
        End With
    
        Selection.Find.Execute Replace:=wdReplaceAll
    
        With Selection.Find
    
            .Text = "^u64260"
    
            .Replacement.Text = "ffl"
    
            .Forward = True
    
            .Wrap = wdFindContinue
    
            .Format = False
    
            .MatchCase = False
    
            .MatchWholeWord = False
    
            .MatchWildcards = False
    
            .MatchSoundsLike = False
    
            .MatchAllWordForms = False
    
        End With
    
        Selection.Find.Execute Replace:=wdReplaceAll
    
    End Sub

    emoji
Children
  •  

    The best way to add code and avoid the automated spam filter is to add it as code using the Insert -> Code menu... like this:

    I edited your post when I created the video.

    Nice tip though... thanks for sharing it.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji