PDF-XChange Editor OCR Image for Chinese is not working

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

skycats
User
Posts: 4
Joined: Fri Jan 26, 2018 3:51 am

PDF-XChange Editor OCR Image for Chinese is not working

Post by skycats »

PDF-XChange Editor
Version: 7.0.323.2
Download: Zip Installer(32/64 bit) And OCR Chinese Languages Pack

When I use the ORC Image, parse strange characters. like as
1.png
this is the result of my copying
2.png
test image
Test Image.png
You do not have the required permissions to view the files attached to this post.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17948
Joined: Mon Jan 12, 2009 8:07 am
Location: London

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by Tracker Supp-Stefan »

Hello skycats,

Thanks for your post and the sample file.
While the image quality is not ideal I managed to get this result (for the first three lines of the main text):
本书的写柞缂起几年前找学习 WPFn 固为找是从 Windows Forms 开发转来做 wPF 开发的. 学习过程中
遇到帷多新概念 新特性, 其中包括 Data Binding. 路由鬟伴 命令~ 各种模板等- 我的工作风格足对于每个
新知识. 一定先把它理孵进伽 涮丑白再应用于项目中. 不然总感觉仗用起未不放心1 于是就对照已有的英

I can't quite tell if it's the correct result - but to me it looks almost as if it is.

I used "Medium" quality and "Preserve original content and add text layer on top".

Regards,
Stefan
skycats
User
Posts: 4
Joined: Fri Jan 26, 2018 3:51 am

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by skycats »

Tracker Supp-Stefan wrote:Hello skycats,

Thanks for your post and the sample file.
While the image quality is not ideal I managed to get this result (for the first three lines of the main text):
本书的写柞缂起几年前找学习 WPFn 固为找是从 Windows Forms 开发转来做 wPF 开发的. 学习过程中
遇到帷多新概念 新特性, 其中包括 Data Binding. 路由鬟伴 命令~ 各种模板等- 我的工作风格足对于每个
新知识. 一定先把它理孵进伽 涮丑白再应用于项目中. 不然总感觉仗用起未不放心1 于是就对照已有的英

I can't quite tell if it's the correct result - but to me it looks almost as if it is.

I used "Medium" quality and "Preserve original content and add text layer on top".

Regards,
Stefan
thank you for your reply.
to see your results, I conducted some tests.
finally, I found that this is a bug

I printed a pdf file using other tools. see
test.pdf
(above is the text, below is a screenshot)

when I use OCR Page, it works fine.
TIM截图20180127024630.png

Code: Select all

我是测试文本0 我是测试文本O 我是测试文本〇
I am test text. I am test text. I am test text.
when I use OCR Image, it does not work.
TIM截图20180127024921.png

Code: Select all

332%iflfliitfidio fi%ifilflifiilfio fiEifilflifiY$o
I am test text. I am test text. I am test text.
:D these strange characters, looks like a character encoding error
You do not have the required permissions to view the files attached to this post.
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8613
Joined: Wed Jan 03, 2018 6:52 pm

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by TrackerSupp-Daniel »

By any chance, are you using english while trying to OCR? If you are you can download alternate OCR languages here:
https://www.pdf-xchange.com/pdf-xchange-viewer-ocr
I believe this is the most likely reason behind this situation as my test with English as the selected language net similar results to your tests.
xxasaq?wm?m?Fméiia wm Eli‘J-?iAk Windows Forms afii?usk WPF ?ié?. Ermi?'?
i?f'HtlifTM-?r. ??lk’i. ?q’?ie‘; Data Binding. 33-113344'. ¢¢~ ?-‘?'?i‘iyf- ??l?'m?x??f?“?
1154mm. —-2;i4wazmmsm mullai?r?m—?m??v. mma?mtmn?km?ru, fiit??????‘li
xvi—NH" MSDNi?-‘??ib?‘i?i??. ?-????. ?xixk?ni??ik?r?i. rk?k?i?éal??. Lit
w—+mm.wmmgm?n??iizék?m?m???mhu?.k?????n?giim#&m.
?jhi—$f¥3??. fi?Tskvufriél-Eiuk?. ii$4$tr?z$¢mt?7Mll??liia?gf¥~—«iii/xii
:1: WPF».

Conversely, with the chinese character pack installed
????????????? WPFn ????? wmdows FOnns????? wPF ???. ?????
??????? ???, ???? Data Binding. ???? ??~ ?????- ???????????
???. ????????? ????????? ???????????? ?????????
???? MSDN????????? ?-???, ???????????? ????????? ??
????? ?????? ?????????????????? ???????????????
???????? ???????????? ????????????????????((???
? WPF».
Significantly fewer errors.
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
skycats
User
Posts: 4
Joined: Fri Jan 26, 2018 3:51 am

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by skycats »

TrackerSupp-Daniel wrote:By any chance, are you using english while trying to OCR? If you are you can download alternate OCR languages here:
https://www.pdf-xchange.com/pdf-xchange-viewer-ocr
I believe this is the most likely reason behind this situation as my test with English as the selected language net similar results to your tests.
xxasaq?wm?m?Fméiia wm Eli‘J-?iAk Windows Forms afii?usk WPF ?ié?. Ermi?'?
i?f'HtlifTM-?r. ??lk’i. ?q’?ie‘; Data Binding. 33-113344'. ¢¢~ ?-‘?'?i‘iyf- ??l?'m?x??f?“?
1154mm. —-2;i4wazmmsm mullai?r?m—?m??v. mma?mtmn?km?ru, fiit??????‘li
xvi—NH" MSDNi?-‘??ib?‘i?i??. ?-????. ?xixk?ni??ik?r?i. rk?k?i?éal??. Lit
w—+mm.wmmgm?n??iizék?m?m???mhu?.k?????n?giim#&m.
?jhi—$f¥3??. fi?Tskvufriél-Eiuk?. ii$4$tr?z$¢mt?7Mll??liia?gf¥~—«iii/xii
:1: WPF».

Conversely, with the chinese character pack installed
????????????? WPFn ????? wmdows FOnns????? wPF ???. ?????
??????? ???, ???? Data Binding. ???? ??~ ?????- ???????????
???. ????????? ????????? ???????????? ?????????
???? MSDN????????? ?-???, ???????????? ????????? ??
????? ?????? ?????????????????? ???????????????
???????? ???????????? ????????????????????((???
? WPF».
Significantly fewer errors.
please see #3.
I have installed the chinese character pack.
you can download test.pdf for testing.
try my steps.
(click image to see gif)
action1.gif
action2.gif
You do not have the required permissions to view the files attached to this post.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17948
Joined: Mon Jan 12, 2009 8:07 am
Location: London

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by Tracker Supp-Stefan »

Hi skycats,

Thanks for the report.
I now managed to reproduce the issue and have created a ticket in our internal system:
#4218: Editor 323.2: OCR image does not work the same as Document -> OCR Pages...
To allow our developers to investigate and get this fixed.

In the mean time please use the Document -> OCR pages as a workaround!

Regards,
Stefan
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by Sasha - Tracker Dev Team »

This was fixed and will be available from the next release.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ