Special Characters in target text in different font than other characters using Studio 2017 and translating a pdf

Hello,

I am using Studio 2017 and am very impressed that I can now "translate" a pdf, and the results are quite good.

However for my last project all the special characters came out in target text as Calibri (Textkörper) whereas regular characters were Arial. It was no problem to adjust this in the target file but I am wondering why this is happening. Is there a setting I could use to enforce the same font in the target file in the future?

Thanks for helping me with this.

Ursula

  • Hi Ursula,

    Most likely you are hiding the control tags. Try pressing Ctrl+Shift+H in the Studio editor and the formatting tags may well appear. Then make sure you transfer them over to the target segments.

    However, you might find that a better approach altogether is to open the PDF as a single document (Ctrl+Shift+O is the quick way to do this) and you'll find a DOCX version of the source created in the same folder as the PDF. Now that you have this tidy up the DOCX file so it is cleaner and easier to handle and then translate the DOCX instead. wrote a nice article on how to clean the file here:

    signsandsymptomsoftranslation.com/.../

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,
    I am not hiding the control tags, I have them in plain view. But I should have explained the issue in more detail.
    I am translating from English to German. An English word, for instance, is "quality agreement" and the corresponding German translation is "Qualitätsvereinbarung". When I create the target file, the characters "Qualit" and "tsvereinbarung" are in Arial whereas the "ä" German umlaut (special character) is in Calibri (Textkörper) font.

    I don't think this issue can be solved by cleaning up the DOCX file before starting the translation. But that is a good point and sensible to do, of course.

    I already read Emma's article a while ago, it is very valuable information.

    Best
    Ursula
  • Any chance I can see the SDLXLIFF?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Sorry, I have an NDA with the customer.
    I'll think about a way to anonymize it and will send it to you.
    Cheers,
    Ursula
  • Maybe just send a sentence or two from the source that illustrates the problem? I never like getting huge documents to investigate issues anyway! You could easily anonymise the text this way.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,
    my reply just disappeared so here is another try. I have a file for you. Can you please specify an email address to send it to or another means of getting it to you.
    I have taken the EN doc file created from the EN pdf, shortened and anonymized it, converted it to sdlxliff by adding it to my project and doing batch tasks, "translated" with lots of umlauts and created target text from it.
    The resulting file has Umlauts both in Arial but also in Calibri (body) (in my case Calibri (Textkörper) as I am using the German MS Word Gui). The Calibri (body) characters appear in the heading, in the table and in the footer but also in some regular text (UNTER BERÜCKSICHTIGUNG...).
    Regards,
    Ursula
  • You can send it to pfilkin@sdl.com Make sure you send the source file and the sdlxliff you have created that causes this problem for you.

    Thank you

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Ursula,

    I see your problem... quite an interesting behavior.  We discussed this a little with the support/dev team and reached the following conclusion:

    1. The problem is specific to this file, probably because of the conversion process used to get the DOCX from the PDF
    2. The Arial font in this particular file does not like these types of characters "ä".  If you try and write this into the Word file itself it immediately turns to Calibri.  The reason for this is probably that Word is falling back to the default style when it comes across a character it can't handle in the font being used.
    3. The Normal style for your file is Calibri:

    I also used TransTools (http://appstore.sdl.com/app/transtools/543/ ) to clean up your source file which certainly makes it a lot easier to handle as most of the excesssive tagging is gone, but it did not remove this problem from your file.  So I also changed the style of the Normal Style to Arial.  Now all is well.

    So to conclude, this is not a problem with Studio, rather it's a problem with the PDF conversion process and the controls within the original Word file.  I'll email you the corrected versions of my tests.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi Paul,
    thanks for all your efforts. It is great to get answers to my questions so fast and almost any time of the day and night.

    So in order to be on the safe side what I'll do in the future when I get a pdf to translate (another one just arrived)
    I will take the source doc file that Studio creates, run transtools on it to solve the excessive tagging issue, and then streamline it all into one font.

    Kind regards,
    Ursula