Page 1 of 2

Correcting OCR errors?

Posted: Sun Oct 26, 2014 8:23 pm
by plonz
Hi,

is it possible to correct ocr errors in a pdf document for a single word that was not recognized correctly without starting the ocr function for the whole pdf?

I'm using pdf x change viewer 2.5

Thanks

Re: Correcting OCR errors?

Posted: Tue Oct 28, 2014 5:29 pm
by Tracker Supp-Stefan
Hello plonz,

Not in the Viewer, but if you download the PDF XChange Editor:
https://www.pdf-xchange.com/product ... nge-editor
You will be able to do such corrections as the OCR result is a layer of invisible text placed on top of the original image - and the Editor can modify this text (it will become visible when you edit it but you can then make it invisible once again).

Regards,
Stefan

Re: Correcting OCR errors?

Posted: Tue Apr 28, 2015 4:28 pm
by pdfcoder
Did try this feature...

But:
If editing text for example by copy+paste to an external text editor, modifying and copying back, searching key words does not correctly highlight search hit in pdf display.

How to display text layer direct in PDF Editor?

How to modify ocr scan errors in text layer with keeping relation of pdf display layer and pdf text layer fpr searching purposes?

Thx...

Re: Correcting OCR errors?

Posted: Tue Apr 28, 2015 4:49 pm
by Patrick-Tracker Supp
Hello pdfcoder,

Thanks for the post. If you wish to edit a scanned PDF, you will absolutely need to use the Editor. See the steps below to get this working properly. Please note that this will only work well on pages without images not recognized by the OCR.

The way OCR works, is to place an invisible text layer on top of the existing image layer. Because this document is scanned, and actually has no text, you will need to OCR the document, then remove the underlying image. First, please go to Documents--> OCR pages

Choose All pages (or your preferred preference) When you are satisfied, click OK.

Once the document is OCR'd, you can edit the document, though it is a bit arduous. First, you will need to turn the invisible text placed by the OCR into visible text, then remove the underlying picture. You will then be able to Edit the text.

Change the text color by selecting the parent in the content pane (View-->Other Panes-->Content),
Image

then changing the values through the properties pane (View-->Other panes-->Properties pane)

Image

and remove the underlying picture through the contents pane (View-->Other Panes-->Content). Select the Image, then use the Delete key to remove it:

Image

Now you are left with a document that contains only text objects.

I hope this helps!

Re: Correcting OCR errors?

Posted: Wed Apr 29, 2015 7:13 am
by pdfcoder
Hello Patrick,
thanks for hint, I did find it.

BUT:
In my opinion it would be much more helpful if editing of text would be possible in content column left side ("Inhalt", red marked).

If editing in right hand viewer window, text changes are breaking layout because of different format setting. It's really stressful...

Re: Correcting OCR errors?

Posted: Wed Apr 29, 2015 11:07 am
by Tracker Supp-Stefan
Hello pdfcoder,

Thanks for the suggestion. We will have it in mind for future improvements.

Regards,
Stefan

Re: Correcting OCR errors?

Posted: Thu May 17, 2018 8:23 am
by sashu
I deleted some images in my OCR document using instructions in this thread. Is it possible not only to delete images, but also to edit a text in the OCR document?

I also found out that some recognized texts are awkward and can be deleted since they don't make sense from the linguistic point of view, such as "5 3,3.3". Does the Pdf editor have such a tool that enhances texts in recognized OCR documents?

Re: Correcting OCR errors?

Posted: Thu May 17, 2018 8:30 am
by Tracker Supp-Stefan
Hello sashu,

Welcome to our forums.
There is no automated tool that will auto touch up or correct the OCR text, but using the Edit content tool - you can modify not only OCR text but also other base content text if such exists.

Regards,
Stefan

Re: Correcting OCR errors?

Posted: Thu May 17, 2018 9:19 am
by sashu
Great, thanks, Stefan. Do you mean I can modify OCR texts on a pane of the PDF Editor, or there is a special standalone tool that does it?

Best, sashu

Re: Correcting OCR errors?

Posted: Thu May 17, 2018 10:43 am
by Tracker Supp-Stefan
Hello Sashu,

The OCR text layer apart from being invisible is the same as any other base content text in a PDF file. So it can be edited just like any other text (and the first thing would normally be to make it visible ;) ).

Regards,
Stefan

Re: Correcting OCR errors?

Posted: Thu May 17, 2018 10:58 am
by sashu
Amazing, could you please teach me to edit text! ;-)

I can place the cursor in my document, but if I press keyboard buttons nothing happens.

Best, sashu

Re: Correcting OCR errors?

Posted: Thu May 17, 2018 11:19 am
by Tracker Supp-Stefan
Hello Sashu,

I believe this KB article we have will be quite helpful:
How do I edit document text after OCR has been performed?

Regards,
Stefan

Re: Correcting OCR errors?

Posted: Thu May 17, 2018 11:58 am
by sashu
Yes, it is a good help. However, I still can't edit the text since the text element that I select in the editor doesn't show the edit palette, although the copied text that I previously analyzed with OCR is correct. Unexpectedly, if I select the background image, the edit palette appears. I attach the PDF.

Best, sashu

Re: Correcting OCR errors?

Posted: Thu May 17, 2018 3:51 pm
by Tracker Supp-Stefan
Hello sashu,

If you click once - you are most likely selecting the object - and see the properties for customizing the whole object.
Try double clicking - and you should then see the properties for the text:
ocr_edit_text.png
Cheers,
Stefan

Re: Correcting OCR errors?

Posted: Fri May 18, 2018 6:39 am
by sashu
Thanks, Stefan, I forgot the magic effect of a double-click. ;-)

Now, I am wondering if it is possible to add new objects to the page, for example, if it is possible to add a text object containing the text "für Entscheider" in my sample that somehow got lost. Moreover, I was wondering if it is possible to merge several objects, for instance, by selecting and by merging several objects in the object pane?

Best, sashu

Re: Correcting OCR errors?

Posted: Fri May 18, 2018 11:49 am
by Tracker Supp-Stefan
Hello Sashu,

Next to Edit base content there is an "Add" button - using that you can add new base content text objects.

Grouping is not really a part of the PDF specification - so it's not possible to group objects.

Regards,
Stefan

Re: Correcting OCR errors?

Posted: Mon May 28, 2018 6:46 am
by sashu
Hello,

Is it possible to change the background of a PDF on all pages after OCR? Is it possible to adjust page sizes automatically in a PDF?

CU, sashu

Re: Correcting OCR errors?

Posted: Mon May 28, 2018 4:57 pm
by TrackerSupp-Daniel
Hello Sashu,

Yes, adding/changing a background on pages is available in the organize tab.
As for adjusting page sizes automatically, you can use the crop tool (on the Organize tab) to cut out blank space if that is what you mean. If you mean whilst printing, you can specify a page size in the print dialog and check the "Reduce to printer margins" option to make all pages fit onto one unified page size.
If that is not what you meant by adjusting page size, please clarify.

Re: Correcting OCR errors?

Posted: Mon May 28, 2018 5:36 pm
by sashu
Great. And is this cropping available dynamically or in the JS?

Re: Correcting OCR errors?

Posted: Mon May 28, 2018 5:56 pm
by TrackerSupp-Daniel
Hello sashu,
I am certain you could create a JS to call the crop command, the Java API reference is available here:
https://www.adobe.com/content/dam/acom/ ... erence.pdf
Otherwise, no it is not available dynamically, you will have to manually call the function each time it is performed.

Re: Correcting OCR errors?

Posted: Tue May 29, 2018 11:29 am
by sashu
How can I remove a rectangle on the margin? I tried it with Crop Page Tool, but it doesn't work.
Best, Sashu
TrackerSupp-Daniel wrote:Hello Sashu,

Yes, adding/changing a background on pages is available in the organize tab.
As for adjusting page sizes automatically, you can use the crop tool (on the Organize tab) to cut out blank space if that is what you mean. If you mean whilst printing, you can specify a page size in the print dialog and check the "Reduce to printer margins" option to make all pages fit onto one unified page size.
If that is not what you meant by adjusting page size, please clarify.

Re: Correcting OCR errors?

Posted: Tue May 29, 2018 9:07 pm
by TrackerSupp-Daniel
Hello sashu,

I will need some clarification on what exactly you mean by this... currently I am not sure how to assist with that question.
Could you provide screenshots or files depicting what it is you wish to do?

Re: Correcting OCR errors?

Posted: Wed May 30, 2018 4:14 am
by sashu
Yes, of course.
TrackerSupp-Daniel wrote:Hello sashu,

I will need some clarification on what exactly you mean by this... currently I am not sure how to assist with that question.
Could you provide screenshots or files depicting what it is you wish to do?

Re: Correcting OCR errors?

Posted: Wed May 30, 2018 7:22 am
by Will - Tracker Supp
Hi Sashu,

Thanks for that - When you used the Crop Pages Tool, did you specifically select the checkbox to remove content outside of the cropbox? If not, please do and try again. By default and as a general standard, cropping does not actually remove data from a file, but instead simply re-defines the visible area such that the content outside the new visible area is still stored within the document.

Thanks,

Re: Correcting OCR errors?

Posted: Wed May 30, 2018 8:28 am
by sashu
Hi Will,

actually, I do want that the cropbox content disappears and the content outside remains. I've already selected the "Remove the content outside the area" checkbox and clicked OK. The result was only the crop area, but I wanted everything besides it. :(

Best, sashu
Will - Tracker Supp wrote:Hi Sashu,

Thanks for that - When you used the Crop Pages Tool, did you specifically select the checkbox to remove content outside of the cropbox? If not, please do and try again. By default and as a general standard, cropping does not actually remove data from a file, but instead simply re-defines the visible area such that the content outside the new visible area is still stored within the document.

Thanks,

Re: Correcting OCR errors?

Posted: Wed May 30, 2018 9:26 am
by Will - Tracker Supp
Hi sashu,

Thanks for getting back to me - If you're looking to just remove that small rectangle of content, shown in your picture, then you'll need to use the Redaction Tool to remove it. The Crop Page(s) tool was not designed to be used that way. The Redaction Tool is found under the "Protect" tab of the UI. You must first mark the area for redaction, then apply the redaction. Please note that the default colour for applied redaction is black, so if you want it to be white, you'll need to change this in the tool's properties.

Thanks,

Re: Correcting OCR errors?

Posted: Fri Jun 01, 2018 6:25 am
by sashu
Hi,

I assume I ran the OCR enhancement twice on the same PDF since I thought newly recognized texts would overwrite the old texts. However, it seems that the recognition results were summed and the resulting PDF stores now two texts for every text abstract. Does it make sense?

Best, sashu

Re: Correcting OCR errors?

Posted: Fri Jun 01, 2018 10:10 am
by Will - Tracker Supp
Hi Sashu,

That is to be expected and is how it is designed, I believe. Files should really be OCR'd in their entirety once. If you need/want to do it a second time, you'd need to remove the original OCR layer first, to avoid duplication. This is best done via the Content Pane, or using the Edit Content Tool.

Cheers,

Re: Correcting OCR errors?

Posted: Fri Jun 01, 2018 10:31 am
by sashu
Thanks, Will. You mean, if I run OCR twice, a PDF would contain two text layers? And how with PDF corrections during OCR through options such as "Text sharpening"? After saving the PDF, the images seem to become different.

Anyway, I wanted to find the edit content tool, but I don't get the Tools-menu that is shown on this page:
https://help.pdf-xchange.com/pdfxe ... ntent+pane
I have the Form menu and immediately the Bookmarks menu without the Tools menu.

Br, sashu

Re: Correcting OCR errors?

Posted: Fri Jun 01, 2018 10:54 am
by Willy Van Nuffel
1) Each time you run OCR on a PDF via PDF-XChange Editor, an additional text layer is add.
Normally, it would be better to get:
- a warning in case there already is a text layer and
- a question if you would like to remove it AND still run the OCR process.

2) When you decide to apply compression or filters to scanned text or images, the image(s) indeed get modified (for better or worse).

3) To get the "Tools" entry back in the Menu bar, you will have to do a "reset" of the Menu bar, in the following way:
- right mouse click on the menu bar or on the toolbars
- click "Customize toolbars"
- in the dialog box, click (and so, select) the item called "Menu"
- click the Reset button in toolbar in the dialog box
- confirm the reset by clicking Yes
you should see the missing item(s) coming back in the Menu bar
- click Close (to close the dialog box)

Regards.

Re: Correcting OCR errors?

Posted: Fri Jun 01, 2018 11:07 am
by sashu
I am afraid my PDF got worse. :(

I tried to get the Tools menu, but it is not supposed to appear (even after resetting all toolbars):
snapshot-1.06.2018.jpg

Re: Correcting OCR errors?

Posted: Fri Jun 01, 2018 11:14 am
by Willy Van Nuffel
In that case, it is just the Ribbon User Interface (Ribbon UI) that is active.

At the right in the title bar of the PDF-XChange window, click the appropriate icon and make your choice to "Switch to Classic Toolbars".

Re: Correcting OCR errors?

Posted: Fri Jun 01, 2018 11:19 am
by sashu
Hallelujah. I wonder who likes this Ribbon-thing?

Re: Correcting OCR errors?

Posted: Fri Jun 01, 2018 12:24 pm
by Tracker Supp-Stefan
Hello Sashu,

I personally now like it more than the Classic one - even if there was some learning curve at the start.
In the classic UI we had some very long menus (e.g. Document) - and the tools are now spread over more categories, so the options are fewer in each ribbon, which makes it easier to work with them!

And we intend to keep the Classic UI for everyone that prefers this over the ribbon.

Regards,
Stefan

Re: Correcting OCR errors?

Posted: Fri Jun 01, 2018 1:03 pm
by sashu
Stefan, I see -- but this seems to be the problem of correct grouping of tools and not the Classic-ribbon preference. And anyhow, where can I find the Edit content pane in the ribbon design?
Br, sashu

Re: Correcting OCR errors?

Posted: Fri Jun 01, 2018 1:37 pm
by Willy Van Nuffel
In case you are looking for "Edit > All Content", then you can find it in the Home ribbon - Edit icon]
In case you are looking for the "Content" pane, then you can find it in the View ribbon - Panes icon

Re: Correcting OCR errors?

Posted: Fri Jun 01, 2018 2:12 pm
by Tracker Supp-Stefan
Thanks for the help Willy!

Cheers,
Stefan

Re: Correcting OCR errors?

Posted: Mon Jun 04, 2018 7:50 am
by sashu
Hi,

Is it possible to edit recognized text in the tree? Of course, it is possible to delete a text node with the wrong text content and add a text node with the correct content, but I didn't find how to add a text node to a certain parent node?

CU, sashu
snapshot-4.06.2018.jpg
snapshot-4.06.2018.jpg (16.36 KiB) Viewed 22035 times
Tracker Supp-Stefan wrote:Hello pdfcoder,

Thanks for the suggestion. We will have it in mind for future improvements.

Regards,
Stefan

Re: Correcting OCR errors?

Posted: Mon Jun 04, 2018 10:27 am
by Tracker Supp-Stefan
Hello sashu,

Currently I am afraid that it is not possible to make changes to the Content items from the content pane directly.
You can select the object from that pane however - and it should mark it on the page as well. You can then Edit it straight on the page.

Regards,
Stefan

Re: Correcting OCR errors?

Posted: Wed Jun 06, 2018 7:47 am
by sashu
Hi,

I've got another problem. After running Enhance Scanned Pages all PDF pages (343) change and some content disappears. Could it be interesting for you?

Here the snapshot:
snapshot-6.06.2018.jpg
Best, sashu

Re: Correcting OCR errors?

Posted: Wed Jun 06, 2018 8:49 pm
by Willy Van Nuffel
I think it might help for Tracker Support if you could also post a print screen of your "Enhance Scanned Pages" settings (dialog box).

Regards.

Re: Correcting OCR errors?

Posted: Thu Jun 07, 2018 12:05 am
by TrackerSupp-Daniel
Hi everyone,
This is certainly an interesting development, Willy is correct that the OCR or "Enhanced Scanned Pages" Dialog options would be helpful. Could I also ask for a Before and After copy of these documents? If they contain sensitive information you can send them via email to support@pdf-xchange.com
Be sure to include a link to this forum thread if you do send an email.

Hope to hear back soon!

Re: Correcting OCR errors?

Posted: Thu Jun 07, 2018 5:36 am
by sashu
Hi Willy,

everything Off and Text sharpening High

Best,Sashu
snapshot-7.06.2018.jpg
Willy Van Nuffel wrote:I think it might help for Tracker Support if you could also post a print screen of your "Enhance Scanned Pages" settings (dialog box).

Regards.

Re: Correcting OCR errors?

Posted: Thu Jun 07, 2018 10:28 pm
by TrackerSupp-Daniel
Hi Sashu,
Thank you for those settings,
We will still need a before and after copy of these documents. As before, if they contain sensitive information you can send them via email to support@pdf-xchange.com
Be sure to include a link to this forum thread if you do send an email.

Re: Correcting OCR errors?

Posted: Fri Jun 08, 2018 4:48 am
by sashu
Hi Daniel,

the files are 230MB and 370MB.

Best, sashu

Re: Correcting OCR errors?

Posted: Fri Jun 08, 2018 9:53 am
by Willy Van Nuffel
@sashu

In your reply on June 06, 2018 you posted images of pages where the "Enhance Scanned Pages" feature went wrong.

So, you do not have to send or post the whole PDF file.

For Tracker Software it should be sufficient if you could make:
- an extract of the concerning pages from the "original PDF" and also
- an extract of the same pages from the "resulting PDF".

You probably know that you can do that via the Document-menu > Extract Pages, indicating which pages you like to extract, saving these pages to one file, giving it a name and a destination folder. In this way creating a first PDF from what it is 'before' running the Enhance Scanned Pages and a second PDF from what it is 'after'.

Thanks for your collaboration.

Re: Correcting OCR errors?

Posted: Fri Jun 08, 2018 11:14 am
by sashu
Willy, tell me when you resolve the issue. Thanx

Re: Correcting OCR errors?

Posted: Fri Jun 08, 2018 5:31 pm
by TrackerSupp-Daniel
Thank you for uploading the files sashu,
Willy does not actually work for us, though he is a very knowledgeable forum user. Thus while we appreciate all the help he offers us and the other forum users, he does not have access to our useruploads server to help with this end of troubleshooting. We will nonetheless inform you once we have identified the issue with these files.

Re: Correcting OCR errors?

Posted: Fri Jun 08, 2018 5:48 pm
by sashu
Fine, thank you! :)

Re: Correcting OCR errors?

Posted: Fri Jun 08, 2018 8:12 pm
by TrackerSupp-Daniel
Hi sashu, I have created a ticket for this issue specifically for your files so that the dev team can directly investigate and better implement a fix.
#4386: These files are negatively affected by Enhanced OCR
I cannot guarantee a timeline at the moment, however I can promise we will work to have it improved.