Remove OCR

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
Feelda
User
Posts: 49
Joined: Wed Feb 03, 2021 12:04 pm

Remove OCR

Post by Feelda »

Hello,

I'd like to know how to proceed if I only want to delete the OCR of a document (but that only).

Thanks !
Willy Van Nuffel
User
Posts: 2347
Joined: Wed Jan 18, 2006 12:10 pm

Re: Remove OCR

Post by Willy Van Nuffel »

The best way I can think about, to remove OCR text, is the following method:
- show the "Content"-pane (View-ribbon > Panes-icon > Content)
- in the toolbar of the Content-pane, click the "Options..."-button
- click Select > Text
- press the "Delete"-button on your keyboard OR click the Delete-icon in the Content-pane

That should be it.

Kind regards.

Willy.
Last edited by Willy Van Nuffel on Thu Jun 03, 2021 9:40 am, edited 1 time in total.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17818
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Remove OCR

Post by Tracker Supp-Stefan »

Hello Willy Van Nuffel,

Thanks for the help!
Indeed there is no way to distinguish an OCR text from other base content text quickly or in some automated way, so indeed this has to be done through the contents pane.

Kind regards,
Stefan
Feelda
User
Posts: 49
Joined: Wed Feb 03, 2021 12:04 pm

Re: Remove OCR

Post by Feelda »

Hi, thanks for your replies,

Would "Sanitize" and then "remove obscurred content" work or would it remove other things than just my OCR...?

Kind regards,
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17818
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Remove OCR

Post by Tracker Supp-Stefan »

Hello Feelda,

The OCR text layer is not really obscured. It is actually on top of the image content on the page.
So I do not think Sanitize would help and it could indeed remove more than just the OCR text.

Can you please explain in a bit more detail what you are trying to achieve - do you want to run a new OCR and as such want to remove an older one that is incorrect?

Kind regards,
Stefan
Feelda
User
Posts: 49
Joined: Wed Feb 03, 2021 12:04 pm

Re: Remove OCR

Post by Feelda »

Stefan,

"do you want to run a new OCR and as such want to remove an older one that is incorrect?"
That is an example yes.

Or sometimes, I have OCR on a document I want to get rid of before sending it to a client. That's what made me think of the "sanitize" button...but I have a hard time understanding what does what when I have to select what I want to sanitize...

Thanks !
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17818
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Remove OCR

Post by Tracker Supp-Stefan »

Hello Feelda,

Thanks for the clarification.
I am afraid that what we discussed above is what you will need to utilize to remove that OCR layer.

Kind regards,
Stefan
Feelda
User
Posts: 49
Joined: Wed Feb 03, 2021 12:04 pm

Re: Remove OCR

Post by Feelda »

Thank you for your answer Stefan.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17818
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Remove OCR

Post by Tracker Supp-Stefan »

:)
MaxBuck
User
Posts: 5
Joined: Wed Jun 09, 2021 8:45 pm

Re: Remove OCR

Post by MaxBuck »

Simple solution is just to print that pdf to another using Microsoft Print to PDF. The newly "printed" PDF won't have the OCR information; it will simply be an image file.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17818
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Remove OCR

Post by Tracker Supp-Stefan »

Hello MaxBuck,

Yes - that could work, but it could also convert otherwise vector based content to raster - which inevitable would mean loss of quality!

Kind regards,
Stefan
Post Reply