IRIS PDF OCR Support for Studio – Chinese image-based PDF – PDF conversion not working properly

Hi!

I would like to test the capabilities of the IRIS OCR for image-based Chinese PDFs.

However, the result is not as expected (see original file):

 

Settings window showing 'Recognize PDF text' options with 'Problem characters only' selected and 'Use IRIS technology for optical character recognition' checked.


SDLXLIFF preview:

SDLXLIFF preview window displaying garbled and incorrect characters, indicating a failed OCR conversion of Chinese text.

Original PDF in Chinese (see attached):

Original PDF document in Chinese, text appears clear and legible, no visible errors.

Thanks for helping!

Best regards,
Manuel

zh (image based).pdf



Generated Image Alt-Text
[edited by: Trados AI at 3:14 PM (GMT 0) on 28 Feb 2024]
emoji
Parents
  • Hi Manuel,

    The behavior, as is, is intended, although it may not look like it at first.

    Here is why:

    The preview generated is generated from the source language of the first language pair listed in

    File > Options > Language Pairs

    In your case the source language seems to have been English or some other language using the western alphabet, while the document was Chinese and with the wrong source language the characters were not recognized correctly.

    If on the other hand you check the preview during project creation, or in the project settings afterward, you try the preview, there usually is only a single language pair present, and the correct one will be used automatically.

    Kind regards

    Peter Lehn | Support Engineer | SDL | Language Technologies Division

Reply
  • Hi Manuel,

    The behavior, as is, is intended, although it may not look like it at first.

    Here is why:

    The preview generated is generated from the source language of the first language pair listed in

    File > Options > Language Pairs

    In your case the source language seems to have been English or some other language using the western alphabet, while the document was Chinese and with the wrong source language the characters were not recognized correctly.

    If on the other hand you check the preview during project creation, or in the project settings afterward, you try the preview, there usually is only a single language pair present, and the correct one will be used automatically.

    Kind regards

    Peter Lehn | Support Engineer | SDL | Language Technologies Division

Children
No Data