How to separate pdfs with images from "normal" pdfs

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Chris - Tracker Supp, Tracker Supp-Stefan

Post Reply
michipapa
User
Posts: 41
Joined: Tue Dec 08, 2009 10:44 pm

How to separate pdfs with images from "normal" pdfs

Post by michipapa » Fri Aug 31, 2018 8:44 am

Hi,
We use the tracker ocr as a service at a server to manipulate all incoming pdfs of our document management system.
How can I separate the pdfs with images from the "normal" pdfs to reduce the time to run ?

And what happend with "normal" pdfs if I put theses files in the ocr process ?

regards Michael

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 2420
Joined: Wed Jan 03, 2018 6:52 pm

Re: How to separate pdfs with images from "normal" pdfs

Post by TrackerSupp-Daniel » Fri Aug 31, 2018 4:07 pm

Hello Michipapa,
Currently there is not a method to separate the PDF's based on content.
There is a checkbox in the OCR function to "Skip pages that already contain text content items", this may help in your situation. Note however that this function will also skip pages that have both images and base content text on them, so it may not be a catch all solution.

For "normal" PDFs that are processed with OCR, if the aforementioned checkbox is checked off, they will not be affected, and will add minimal time to the process que. If the tickbox is not checked, you may find that you have a duplicate layer of invisible text on the document.

I hope this helps!

Edit:
I have just brought this to the Dev team, and we have decided to undertake the challenge. I cannot make any promises about a timeline for the function, but If you are ever looking for updates on the progress, please ask any member of our support staff about the below ticket number, and we will be able to assist.

RT #4474
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

michipapa
User
Posts: 41
Joined: Tue Dec 08, 2009 10:44 pm

Re: How to separate pdfs with images from "normal" pdfs

Post by michipapa » Fri Aug 31, 2018 7:00 pm

Hi Daniel,

If you write
>There is a checkbox in the OCR function to "Skip pages that already contain text content items"

which function or parameter of your OCR - SDK do you mean ? I see this in the GUI of the PDF Editor but not in the OCR Optionlist .....

regards Michael

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 2420
Joined: Wed Jan 03, 2018 6:52 pm

Re: How to separate pdfs with images from "normal" pdfs

Post by TrackerSupp-Daniel » Fri Aug 31, 2018 8:06 pm

Hello michipapa,

My sincerest apologies, I jumped on this a bit quickly and did not notice that it was an SDK issue.
While this option is available from the End User GUI, I do not believe that they are available from the OCR SDK. With that being said, I've created another feature request for you, this time to add these functions into the SDK products.

#4475: FR: OCR SDK Add more scan options

Hopefully we can add these in soon, but until then, I do not have an interim solution for you. Ive asked the dev team for more information on this, so should anything come up, or if they find a workaround to help you implement it, I am sure they will let you know.
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

User avatar
Sasha - Tracker Dev Team
User
Posts: 4220
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: How to separate pdfs with images from "normal" pdfs

Post by Sasha - Tracker Dev Team » Sat Sep 01, 2018 5:48 am

Hello Michael,

If you want to deeply control the OCR logic - I recommend using it in pair with the Core API SDK. What I see from this page is that you should have it in the PRO SDK bundle:
https://www.tracker-software.com/produc ... ge-pro-sdk
Though I do not know what license do you have exactly.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

Post Reply