PDF-XChange Editor OCR Image for Chinese is not working

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
skycats
User
Posts: 4
Joined: Fri Jan 26, 2018 3:51 am

PDF-XChange Editor OCR Image for Chinese is not working

Post by skycats »

PDF-XChange Editor
Version: 7.0.323.2
Download: Zip Installer(32/64 bit) And OCR Chinese Languages Pack

When I use the ORC Image, parse strange characters. like as
1.png
this is the result of my copying
2.png
test image
Test Image.png
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17820
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by Tracker Supp-Stefan »

Hello skycats,

Thanks for your post and the sample file.
While the image quality is not ideal I managed to get this result (for the first three lines of the main text):
本书的写柞缂起几年前找学习 WPFn 固为找是从 Windows Forms 开发转来做 wPF 开发的. 学习过程中
遇到帷多新概念 新特性, 其中包括 Data Binding. 路由鬟伴 命令~ 各种模板等- 我的工作风格足对于每个
新知识. 一定先把它理孵进伽 涮丑白再应用于项目中. 不然总感觉仗用起未不放心1 于是就对照已有的英

I can't quite tell if it's the correct result - but to me it looks almost as if it is.

I used "Medium" quality and "Preserve original content and add text layer on top".

Regards,
Stefan
skycats
User
Posts: 4
Joined: Fri Jan 26, 2018 3:51 am

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by skycats »

Tracker Supp-Stefan wrote:Hello skycats,

Thanks for your post and the sample file.
While the image quality is not ideal I managed to get this result (for the first three lines of the main text):
本书的写柞缂起几年前找学习 WPFn 固为找是从 Windows Forms 开发转来做 wPF 开发的. 学习过程中
遇到帷多新概念 新特性, 其中包括 Data Binding. 路由鬟伴 命令~ 各种模板等- 我的工作风格足对于每个
新知识. 一定先把它理孵进伽 涮丑白再应用于项目中. 不然总感觉仗用起未不放心1 于是就对照已有的英

I can't quite tell if it's the correct result - but to me it looks almost as if it is.

I used "Medium" quality and "Preserve original content and add text layer on top".

Regards,
Stefan
thank you for your reply.
to see your results, I conducted some tests.
finally, I found that this is a bug

I printed a pdf file using other tools. see
test.pdf
(78.24 KiB) Downloaded 209 times
(above is the text, below is a screenshot)

when I use OCR Page, it works fine.
TIM截图20180127024630.png

Code: Select all

我是测试文本0 我是测试文本O 我是测试文本〇
I am test text. I am test text. I am test text.
when I use OCR Image, it does not work.
TIM截图20180127024921.png

Code: Select all

332%iflfliitfidio fi%ifilflifiilfio fiEifilflifiY$o
I am test text. I am test text. I am test text.
:D these strange characters, looks like a character encoding error
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by TrackerSupp-Daniel »

By any chance, are you using english while trying to OCR? If you are you can download alternate OCR languages here:
https://www.pdf-xchange.com/pdf-xchange-viewer-ocr
I believe this is the most likely reason behind this situation as my test with English as the selected language net similar results to your tests.
xxasaq?wm?m?Fméiia wm Eli‘J-?iAk Windows Forms afii?usk WPF ?ié?. Ermi?'?
i?f'HtlifTM-?r. ??lk’i. ?q’?ie‘; Data Binding. 33-113344'. ¢¢~ ?-‘?'?i‘iyf- ??l?'m?x??f?“?
1154mm. —-2;i4wazmmsm mullai?r?m—?m??v. mma?mtmn?km?ru, fiit??????‘li
xvi—NH" MSDNi?-‘??ib?‘i?i??. ?-????. ?xixk?ni??ik?r?i. rk?k?i?éal??. Lit
w—+mm.wmmgm?n??iizék?m?m???mhu?.k?????n?giim#&m.
?jhi—$f¥3??. fi?Tskvufriél-Eiuk?. ii$4$tr?z$¢mt?7Mll??liia?gf¥~—«iii/xii
:1: WPF».

Conversely, with the chinese character pack installed
????????????? WPFn ????? wmdows FOnns????? wPF ???. ?????
??????? ???, ???? Data Binding. ???? ??~ ?????- ???????????
???. ????????? ????????? ???????????? ?????????
???? MSDN????????? ?-???, ???????????? ????????? ??
????? ?????? ?????????????????? ???????????????
???????? ???????????? ????????????????????((???
? WPF».
Significantly fewer errors.
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
skycats
User
Posts: 4
Joined: Fri Jan 26, 2018 3:51 am

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by skycats »

TrackerSupp-Daniel wrote:By any chance, are you using english while trying to OCR? If you are you can download alternate OCR languages here:
https://www.pdf-xchange.com/pdf-xchange-viewer-ocr
I believe this is the most likely reason behind this situation as my test with English as the selected language net similar results to your tests.
xxasaq?wm?m?Fméiia wm Eli‘J-?iAk Windows Forms afii?usk WPF ?ié?. Ermi?'?
i?f'HtlifTM-?r. ??lk’i. ?q’?ie‘; Data Binding. 33-113344'. ¢¢~ ?-‘?'?i‘iyf- ??l?'m?x??f?“?
1154mm. —-2;i4wazmmsm mullai?r?m—?m??v. mma?mtmn?km?ru, fiit??????‘li
xvi—NH" MSDNi?-‘??ib?‘i?i??. ?-????. ?xixk?ni??ik?r?i. rk?k?i?éal??. Lit
w—+mm.wmmgm?n??iizék?m?m???mhu?.k?????n?giim#&m.
?jhi—$f¥3??. fi?Tskvufriél-Eiuk?. ii$4$tr?z$¢mt?7Mll??liia?gf¥~—«iii/xii
:1: WPF».

Conversely, with the chinese character pack installed
????????????? WPFn ????? wmdows FOnns????? wPF ???. ?????
??????? ???, ???? Data Binding. ???? ??~ ?????- ???????????
???. ????????? ????????? ???????????? ?????????
???? MSDN????????? ?-???, ???????????? ????????? ??
????? ?????? ?????????????????? ???????????????
???????? ???????????? ????????????????????((???
? WPF».
Significantly fewer errors.
please see #3.
I have installed the chinese character pack.
you can download test.pdf for testing.
try my steps.
(click image to see gif)
action1.gif
action2.gif
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17820
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by Tracker Supp-Stefan »

Hi skycats,

Thanks for the report.
I now managed to reproduce the issue and have created a ticket in our internal system:
#4218: Editor 323.2: OCR image does not work the same as Document -> OCR Pages...
To allow our developers to investigate and get this fixed.

In the mean time please use the Document -> OCR pages as a workaround!

Regards,
Stefan
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF-XChange Editor OCR Image for Chinese is not working

Post by Sasha - Tracker Dev Team »

This was fixed and will be available from the next release.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Post Reply