How to separate pdfs with images from "normal" pdfs

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
michipapa
User
Posts: 41
Joined: Tue Dec 08, 2009 10:44 pm

How to separate pdfs with images from "normal" pdfs

Post by michipapa »

Hi,
We use the tracker ocr as a service at a server to manipulate all incoming pdfs of our document management system.
How can I separate the pdfs with images from the "normal" pdfs to reduce the time to run ?

And what happend with "normal" pdfs if I put theses files in the ocr process ?

regards Michael
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: How to separate pdfs with images from "normal" pdfs

Post by TrackerSupp-Daniel »

Hello Michipapa,
Currently there is not a method to separate the PDF's based on content.
There is a checkbox in the OCR function to "Skip pages that already contain text content items", this may help in your situation. Note however that this function will also skip pages that have both images and base content text on them, so it may not be a catch all solution.

For "normal" PDFs that are processed with OCR, if the aforementioned checkbox is checked off, they will not be affected, and will add minimal time to the process que. If the tickbox is not checked, you may find that you have a duplicate layer of invisible text on the document.

I hope this helps!

Edit:
I have just brought this to the Dev team, and we have decided to undertake the challenge. I cannot make any promises about a timeline for the function, but If you are ever looking for updates on the progress, please ask any member of our support staff about the below ticket number, and we will be able to assist.

RT #4474
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
michipapa
User
Posts: 41
Joined: Tue Dec 08, 2009 10:44 pm

Re: How to separate pdfs with images from "normal" pdfs

Post by michipapa »

Hi Daniel,

If you write
>There is a checkbox in the OCR function to "Skip pages that already contain text content items"

which function or parameter of your OCR - SDK do you mean ? I see this in the GUI of the PDF Editor but not in the OCR Optionlist .....

regards Michael
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: How to separate pdfs with images from "normal" pdfs

Post by TrackerSupp-Daniel »

Hello michipapa,

My sincerest apologies, I jumped on this a bit quickly and did not notice that it was an SDK issue.
While this option is available from the End User GUI, I do not believe that they are available from the OCR SDK. With that being said, I've created another feature request for you, this time to add these functions into the SDK products.

#4475: FR: OCR SDK Add more scan options

Hopefully we can add these in soon, but until then, I do not have an interim solution for you. Ive asked the dev team for more information on this, so should anything come up, or if they find a workaround to help you implement it, I am sure they will let you know.
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: How to separate pdfs with images from "normal" pdfs

Post by Sasha - Tracker Dev Team »

Hello Michael,

If you want to deeply control the OCR logic - I recommend using it in pair with the Core API SDK. What I see from this page is that you should have it in the PRO SDK bundle:
https://www.pdf-xchange.com/produc ... ge-pro-sdk
Though I do not know what license do you have exactly.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Post Reply