Correcting OCR errors?

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

Thanks, Daniel. The timeline is not so important.

They can keep me in the loop. Maybe, it is important for debugging: I can't remember that I had similar problems when I enhanced the current page (there is such an option in the Enhanced dialog).

saschu
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8427
Joined: Wed Jan 03, 2018 6:52 pm

Re: Correcting OCR errors?

Post by TrackerSupp-Daniel »

Hello Sashu,
Indeed I had made a reference to this thread in that ticket, so if the Devs need additional input I am sure they will reach out here to get if from you.
Have a good day!
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

Is it possible to synchronize viewing an element as a tree element in the Content Pane and as a text marked in blue? For example, in the snapshot the text 'text generation' should be selected (synchronized) both in the tree in the content pane and the text viewer in blue. Now the element "text generation" is selected in the text, but not in the content pane
snapshot-15.06.2018.jpg
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17800
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Correcting OCR errors?

Post by Tracker Supp-Stefan »

Hello Sashu,

If you select an element in the content pane - this should set the focus in the main document rendering area so that this element and it's position in the file are visible, but the other is not possible to be activated I am afraid.

Regards,
Stefan
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8427
Joined: Wed Jan 03, 2018 6:52 pm

Re: Correcting OCR errors?

Post by TrackerSupp-Daniel »

Hello Sashu,
Do note that this search will work if you are searching for text in the bookmarks themselves. But only so long as the Include bookmarks option is active.
180615885.png
Incases where there are multiple results, the next button will cycle through highlighting the bookmarks and the page content in the order they appear.
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

Great. Does it mean that first I have to create bookmarks for all text lines holding text content of the whole pdf if I want to search simultaneously in text and ind in the content? How can I do it automatically?

Sashu
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8427
Joined: Wed Jan 03, 2018 6:52 pm

Re: Correcting OCR errors?

Post by TrackerSupp-Daniel »

HI Sashu,
I may be a bit confused here, so I apologize if this is not the answer you wanted.
In my above screenshot there is a display of the options for the Find dialog. If you have all the options ticked off, include [page text, bookmarks, comments, form fields, external links] you should, as implied, be able to search all of those types of text and content simultaneously.
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

Was it your of my question at.Fri Jun 15, 2018 9:50 am? I thoght you've found an answer, but you haven't?
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8427
Joined: Wed Jan 03, 2018 6:52 pm

Re: Correcting OCR errors?

Post by TrackerSupp-Daniel »

What I meant was that I was unsure if I understood your last question.
You've said:
sashu wrote:Great. Does it mean that first I have to create bookmarks for all text lines holding text content of the whole pdf if I want to search simultaneously in text and ind in the content? How can I do it automatically?
I thought you were asking how to search the bookmarks and the text simultaneously, to which I answered:
TrackerSupp-Daniel wrote:In my above screenshot there is a display of the options for the Find dialog. If you have all the options ticked off, include [page text, bookmarks, comments, form fields, external links] you should, as implied, be able to search all of those types of text and content simultaneously.
If you did not mean how to search both simultaneously, please clarify what the issue was, as I seem to have misunderstood what you are asking.
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

It seems you misunderstood my question because I was primarily asking if it is possible to search in the content pane and in the pdf editor simultaneously. I thought it is strictly impossible, but maybe it is not the case.
sashu
Willy Van Nuffel
User
Posts: 2346
Joined: Wed Jan 18, 2006 12:10 pm

Re: Correcting OCR errors?

Post by Willy Van Nuffel »

(Currently) you can not search in, or via, the Content pane.

However, the Search function offers you a way to search through the whole text-content of a PDF.
Via Search > Options..., you can select (via check-marks) where you would like to search in (like already mentioned here above):
- page text
- bookmarks
- comments
- form fields
- external links
- attachments
- document info
sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

Hi,

I have my next questions:
1. In optimized saving, does visual information get lost?
2. Can I rotate a page by several degrees (not +/- 90 degrees)?
3. Is it possible to edit a particular area of a page, for example, to brighten an area up?

Best, sashu
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8427
Joined: Wed Jan 03, 2018 6:52 pm

Re: Correcting OCR errors?

Post by TrackerSupp-Daniel »

Hello Sashu,
for your next questions,
1. This depends on the settings chosen. If you choose to unembed fonts, or heavily compress the images in the PDF than yes, there will be some noticeable visual changes. Namely being that the originally used font (if unavailable on the recipients machine) will use a similar default font during viewing.
2. No, currently we only offer the ability to rotate a page as you have mentioned.
3. Technically, No, however you could try using a transparent rectangle tool over the area to "Highlight" it as such. This article details how to customize tool palettes: https://www.pdf-xchange.com/knowle ... nge-Editor

Regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

Thanks, Daniel. My next next questions:

1. I saw Image objects in the content pane that represent scanned pages. Is it possible to replace these objects through other, for example, through rotated images?
2. I assume, the OCR engine considers everything also garbage on a scanned page when performing OCR. This garbage can however influence recognition. Is my assumption correct?

Best, sashu
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8427
Joined: Wed Jan 03, 2018 6:52 pm

Re: Correcting OCR errors?

Post by TrackerSupp-Daniel »

Hello Sashu,

1. I am unsure what you mean to replace objects through other rotated images, please clarify with an example.
2. Indeed, the OCR function considers everything on the page while processing. This can sometimes leave extra characters on the sheet if the scan is not perfectly clear.
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

As you've said, PDF-XChange can rotate images for OCR only by 90 degrees and I thought up a workaround how to rotate these images by several degrees using a graphic editor. The only thing that I miss now is how to substitute the original page image with the rotated one.

On the attached snapshot, you see the image that is lurched by several degrees. It can be, however, the source of recognition errors in OCR, for example, if the font of the image is small. I have a graphic editor that can rotate images by an arbitrary degree. The only thing I don't know now how to delete the lurched image and to add the new rotated one.
snapshot-20.06.2018.jpg
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8427
Joined: Wed Jan 03, 2018 6:52 pm

Re: Correcting OCR errors?

Post by TrackerSupp-Daniel »

Hello Sashu,
Subsituting an image is done by right clicking an image with the Edit content tool selected, and then selecting Replace:
180620943.png
180620943.png (22.66 KiB) Viewed 11290 times
Another method of rotating images is by grabbing the rotation note handle that you can see coming out of the top portion in this screenshot. this allows for slightly more fine tuning.

Beyond that, so long as the document is not too heavily skewed, do note the deskew feature that our enhanced scanned pages function offers.
180620944.png
I hope this helps!
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

Hello Daniel,

thank you for your suggestions. The first one is exactly what I am looking for; I knew about the second suggestion and frankly speaking don't like it. 1) I don't know how the algorithm works and can't influence processing; 2) the processing is invasive, the original image is edited after enhancement what may be not desirable; 3) the solution is not flexible -- images often need more effects, for example, they not only have to be descewed, but need more contrast.

Best, sashu
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17800
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Correcting OCR errors?

Post by Tracker Supp-Stefan »

Hello Sashu,

Glad to hear you like the solution Dan offered!

Cheers,
Stefan
kmwittko
User
Posts: 1
Joined: Wed Aug 08, 2018 5:17 pm

Re: Correcting OCR errors?

Post by kmwittko »

The problem has been mentioned on June 1, but I didn't understand the answer:
I have a pdf that has several invisible text layers. How can I remove some of them?
User avatar
Patrick-Tracker Supp
Site Admin
Posts: 1645
Joined: Thu Mar 27, 2014 6:14 pm
Location: Vancouver Island
Contact:

Re: Correcting OCR errors?

Post by Patrick-Tracker Supp »

Hello,

Thank you for your post. To remove invisible text, or indeed any text, you may select the blocks from within the content pane (View> Panes> Content)

Some documents may have many kinds of content. You can select them by type:

Image

Once selected simply use the Delete key to remove the text objects. Of course, this will remove all the text. You will otherwise need to select the Edit content tool, then select the invisible text blocks individually (hold CTRL to select multiple at once) and use the Delete key to remove them.

I hope this helps!
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Cheers,

Patrick Charest
Tracker Support North America
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8427
Joined: Wed Jan 03, 2018 6:52 pm

Re: Correcting OCR errors?

Post by TrackerSupp-Daniel »

I am pleased to say that I have just received confirmation this issue (#4386 these Files re negatively affeceted by enhanced OCR) will be resolved with the upcoming build 328, which we plan to release on December 10th.
Please Let us know if you find anything out of place after the update!
Kind regards!
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

Great! If you tell me how I can be registered to be notified of your releases, I will of course verify it. :D
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8427
Joined: Wed Jan 03, 2018 6:52 pm

Re: Correcting OCR errors?

Post by TrackerSupp-Daniel »

Hello Sashu,

I am afraid that we no longer send out notifications upon releases, the best way to keep up to date is to set our updater to automatically check for updates periodically.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

Daniel,

it is also impossible since Tracker Update 7.0.325.1 says that the file "TrackerUpdate.zip" can't be downloaded if I want to check for updates.

Best, sashu
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17800
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Correcting OCR errors?

Post by Tracker Supp-Stefan »

Hello Sashu,

There were some older builds of the updater tool itself that will indeed not find the newer versions.
Please update to 327.1 manually - and from here on the check for updates should work properly.

Regards,
Stefan
sashu
User
Posts: 55
Joined: Mon May 14, 2018 11:53 am

Re: Correcting OCR errors?

Post by sashu »

Hello Stefan,

I manually installed 327.1 and could see the number of the current version in the about box. However, I couldn't set up the updater because of the known error message and had to reboot.

After I rebooted the system, I could finally set up the updater. However, it couldn't install automatically Pdf-Xchange Pro because I wouldn't have administrator permissions. It is definitely not true. I downloaded the Pro version manually and wanted to install it. Although the 325.1 version is registered in the system, this was not enough to install 327.1 -- if I start the pro installation I need to specify the serial number that is not the serial number that I used to install the editor.

Cheers, sashu
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8427
Joined: Wed Jan 03, 2018 6:52 pm

Re: Correcting OCR errors?

Post by TrackerSupp-Daniel »

Hello Sashu,

This would likely mean that you do not have a license which covers the use of all PRO products installed. Might I ask, from your description it sounds as if you have multiple packages installed, which, as PRO overlaps with nearly all others, can cause problems during updates. If you are going to be using the PRO suite, I suggest removing all Tracker-Software Products, and then only installing PRO, if you only need PDF Tools, or our Standard printer, these are available as separate downloads that do not overlap with the Editor or Editor Plus installers.

I hope this helps!
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Post Reply