OCR for non-English / editor confused

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
slaviŠa neŠiĆ
User
Posts: 19
Joined: Fri Feb 19, 2016 8:34 pm

OCR for non-English / editor confused

Post by slaviŠa neŠiĆ »

Hi,
1) I created new document with File / NewDocument / FromScanner, color, 300dpi. That is some non-English text (Serbian Cyrillic).
2) Then I did Document / Crop pages to leave a few paragraphs.
3) Then I clicked Document / OCRpages / Language=Serbian.
4)After that when choosing EditContent, double click on one paragraph, typing some text gives nothing: the letters if any are invisible, the cursor is moving. When typing I am using standard Serbian Cyrillic keyboard implemented on Windows 8.1.

Also find text (Ctrl+F) do not give any positive result when searching some text.

Do you have some ideas how to overcome this? I attached the file.
Attachments
test.zip
(1.36 MiB) Downloaded 88 times
User avatar
Radi - Tracker Supp
Site Admin
Posts: 600
Joined: Tue Mar 03, 2015 12:46 pm

Re: OCR for non-English / editor confused

Post by Radi - Tracker Supp »

Hello slaviŠa neŠi?,

Thank you for the post.

Please take a look at the following knowledge base article to find out how to edit the text in an OCR'd document:
How do I edit text in OCR'd document with the Editor

Please note that using a combination of Slavic languages might get you better results in the OCR. For example, if Serbian is not supported very well, using Russian or Bulgarian alongside Serbian, should produce better character recognition.
In your example file I tried with a combination of Serbian and Bulgarian - the results were far better than using only Serbian.

Regards,
Radi
Attachments
OCR.zip
(71.66 KiB) Downloaded 84 times
slaviŠa neŠiĆ
User
Posts: 19
Joined: Fri Feb 19, 2016 8:34 pm

Re: OCR for non-English / editor confused

Post by slaviŠa neŠiĆ »

Yes, that seems to do the magic, thank you. But the procedure do not leave any pictures behind. But in my example the picture should exist on the right side of document and still remove it's noisy image part on the left.
Is it possible to crop only picture and leave only the right part of picture along with text?

I see the other way: I can adjust the transparency of text layer to 0% and it will override the left image part. Still, it would be beneficial to know if picture itself could be cropped for some other cases?
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR for non-English / editor confused

Post by Will - Tracker Supp »

Hi slaviŠa neŠiĆ,

Thanks for the post, but I'm not sure that I understand what you're looking to do here. Can you please upload a sample file and send any screen-shots that may help us to better understand.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
slaviŠa neŠiĆ
User
Posts: 19
Joined: Fri Feb 19, 2016 8:34 pm

Re: OCR for non-English / editor confused

Post by slaviŠa neŠiĆ »

Let me be precise: how do I crop an image object on a PDF page? For example: my downloaded zip file posted above.
User avatar
Radi - Tracker Supp
Site Admin
Posts: 600
Joined: Tue Mar 03, 2015 12:46 pm

Re: OCR for non-English / editor confused

Post by Radi - Tracker Supp »

Hi slaviŠa neŠiĆ,

Thanks for the post.

It is not possible to crop the image itself, but if you enable the 'Remove the content outside the crop box area' option located in the Crop Page Tool screen (see the attached screenshot), the result will be same as if you cropped the image.

Regards,
Radi
Attachments
Crop Pages.zip
(113.35 KiB) Downloaded 89 times
slaviŠa neŠiĆ
User
Posts: 19
Joined: Fri Feb 19, 2016 8:34 pm

Re: OCR for non-English / editor confused

Post by slaviŠa neŠiĆ »

Hi Radi,

Thank you anyway for the effort.
Best Regards,
-Slavisa
User avatar
Radi - Tracker Supp
Site Admin
Posts: 600
Joined: Tue Mar 03, 2015 12:46 pm

Re: OCR for non-English / editor confused

Post by Radi - Tracker Supp »

Hi Slavisa,

You can also try to 'crop' the picture using the redaction tool. Just mark the picture on the four sides and apply the redaction. This will remove all marked content and preserve the page size.
If you would like to read more about the redaction tool, please read the following knowledge base article:
How to use Redaction

Regards,
Radi
slaviŠa neŠiĆ
User
Posts: 19
Joined: Fri Feb 19, 2016 8:34 pm

Re: OCR for non-English / editor confused

Post by slaviŠa neŠiĆ »

Thank you Radi.
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR for non-English / editor confused

Post by Will - Tracker Supp »

:D
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Post Reply