Page 1 of 1

Barcode finding and reading from PDF content?

Posted: Fri Jan 02, 2015 5:27 pm
by omascia
Are there know good paths to recognize (should I ask this in the OCR topic?) barcodes within a PDF? Through means offered by PDF-Tools SDK or through additional third-parties? It surely would be easy to rasterize a page then hand it for processing to another SDK. Just checking if I'm not about to invest into something I would already have (or close to) and checking if users have past experience doing this with what additional tool. If they're willing to share their experience of course.

Thanks a lot,

Re: Barcode finding and reading from PDF content?

Posted: Fri Jan 02, 2015 7:01 pm
by Paul - Tracker Supp
Hi omascia,

typically bar codes in a PDF are rendered using a barcode font or directly as an image. At present there is not anything in our SDK for reading the bar codes however we are planning on adding some functionality to locate bar codes in a PDF and extract some information from them. This is however a long term goal and is not going to be ready for some time.

Presently you would need to use third party libraries or functions to actually read the bar codes.

I hope that helps.

Re: Barcode finding and reading from PDF content?

Posted: Fri Jan 02, 2015 9:05 pm
by omascia
Yes thanks, it helps to know I'm not wrong in seeking some additional tool for that need.

The barcodes I'd like to recognize will be found as images or part of images within the PDF pages. That could be usual text PDF along with embedded images (built using PDF-Tools SDK) or PDF coming from scanners or fax engines (being wholly picture-based then). I will simply follow this path for now : render the PDF page to a bitmap then analyze the bitmap. Only after a pair of hours of search, it looks like there are a lot of commercial solutions available, most have a ridiculous (higher than reasonable) pricing though. I have also found some open-source code with appropriate / compatible licensing terms. And I have found some good academic papers on the techniques to parse the images for barcodes. Pretty much the same business as OCR, the target is slightly different though and the patterns to recognize are a lot simpler while their count is really limited. Not to mention that I do not need to recognize all the symbologies, just one or two of them, among the simplest to decode (EAN/GS1 7/8/12/13).
I just have to evaluate many of these solutions and then proceed.
So thanks for the confirmation I needed.

Re: Barcode finding and reading from PDF content?

Posted: Fri Jan 02, 2015 10:35 pm
by Paul - Tracker Supp
My pleasure omascia,

do let us know if you need anything further from us here and maybe keep us in the loop regards a solution? I'm sure others here will appreciate hearing what you find out.

Sincerely

Re: Barcode finding and reading from PDF content?

Posted: Wed Jan 07, 2015 3:35 pm
by omascia
In this ongoing quest, I have successfully experimented with rendering the pages and then scanning them for the EAN8 barcodes I'm interested in. Currently experimenting with "zbar" project for that matter (quite nice, albeit clearly inferior and more limited than some commercial tools I have quickly tested).

Now, optimizing the process, I intend to access the images within the PDF to analyze them, instead of rendering the page to analyze the resulting bitmap. As I'm focused on processing multi-pages documents which will have been scanned with some EAN8 stickers on some pages (to mark new documents and tag their nature - invoice, order, letter, unknown, you get the idea) I clearly have no use spending the resources (and time) it takes to render each page for the sole purpose of processing the resulting bitmap.

I can access and extract images from the PDF (including as TIFF files, sorry for the obvious questions in my last topic).

How could I extract the actual embedded images in whatever format they are stored in the PDF, instead of having to save them to some specific format I'd choose (and thus possibly suffering from an image conversion which *might* in some circumstances imply loosing some quality and negatively impact my later detection and decoding of barcodes)?

My internal knowledge of PDF is limited. Are all images in there always resampled and recoded as a single type of image (at time of storage)? Or are we talking of a full wildlife of formats there (depending on whatever software did the scanning and constructed the PDF storing the scanned pages)?

To better understand the context of this question: my final goal is to get a gray-scaled (not indexed) 8bpp bitmap in memory to hand it to the barcode scanning code. It feels convoluted to let PF-Tools convert whatever is in the PDF to some common disk file format to then re-read it, map it as the required bitmap format and then only proceed with it.

Re: Barcode finding and reading from PDF content?

Posted: Thu Jan 08, 2015 7:11 pm
by omascia
Answering my own question... :)
It didn't strike me first that Image-XChange SDK is actually included along with PDF-XChange Pro SDK since versions 5.x.
That is *great* (old) news.
Do you want to know why I missed it? Its documentation is missing from the PDF-XChange Pro SDK download. I had to download the Image-XChange SDK kit to get to it. :)

Now to my task at hand: it actually revolves around the following, thanks to the Image-XChange SDK:

- browse the images using PXCp_ImageGetFromPage()
- get them as Image-XChange objects through PXCp_GetDocImageAsXCPage()
- check their encoding format using IMG_PageGetFormat()
- if needed, convert them to my needed grayscale 8 bpp format using IMG_PageConvertToFormat()
- get access to the array of bytes representing the pixels through IMG_PageLockBlock()
- do my stuff from there
- then cleanup resources, which involves IMG_PageUnlockBlock(), IMG_PageDestroy(), PXCp_ImageClearPageData(), and later PXCp_ImageClearAllData()

It works *really* well.

Re: Barcode finding and reading from PDF content?

Posted: Fri Jan 09, 2015 8:17 pm
by Will - Tracker Supp
Hi omascia,

So am I right in thinking that you have everything you need? Or is there anything else that we can help you with :)

Re: Barcode finding and reading from PDF content?

Posted: Fri Jan 09, 2015 9:23 pm
by omascia
So am I right in thinking that you have everything you need? Or is there anything else that we can help you with :)
You are right! :wink:

Just keep the idea of possibly adding a barcode finding and decoding feature in some future, either as a special feature of the OCR kit or separately. But I now can reach my goal, with very good performance, and with little add-ons.
Keep up the good work. 8)

Re: Barcode finding and reading from PDF content?

Posted: Fri Jan 09, 2015 9:24 pm
by Patrick-Tracker Supp
:D

Re: Barcode finding and reading from PDF content?

Posted: Wed Jan 10, 2018 1:43 pm
by myhealthylawn
Hello,
Has there been any changes to the status of Bar codes conversation? I would like to have the opportunity to be able to copy and print them.
Big deal when printing off my ballgame tickets, or going to a play.

Re: Barcode finding and reading from PDF content?

Posted: Wed Jan 10, 2018 1:47 pm
by Tracker Supp-Stefan
Hello myhealthylawn,

It will depend on the barcode (some are using a special font, some are just images, and some might be dynamically generated), so can you please send us a sample that we can look at?

Cheers,
Stefan

Re: Barcode finding and reading from PDF content?

Posted: Thu Feb 15, 2018 8:08 pm
by myhealthylawn
can we just pick all your options that you suggested and get them all to work. Your program does nothing for me as the default if it cannot do what adobe does.

Re: Barcode finding and reading from PDF content?

Posted: Thu Feb 15, 2018 8:55 pm
by Patrick-Tracker Supp
Hello myhealthylawn,

Could you please provide some examples of what you mean? I am afraid that it is not as simple as you might think, but we should be able to perform everything adobe can. If that is not the case we will need to see one of these files so that we can investigate the issue, whatever it may be.
Thank you!

Re: Barcode finding and reading from PDF content?

Posted: Sun Feb 18, 2018 5:02 pm
by myhealthylawn
Attached 2 images showing bar codes from the original ticket to Tracker software to adobe

Re: Barcode finding and reading from PDF content?

Posted: Mon Feb 19, 2018 1:18 pm
by Tracker Supp-Stefan
Hello myhealthylawn,

Thanks for sharing those screenshots.
We are aware of the different sub specifications of the PDF file format, and it's unlikely that a specific sub format is causing the issue.
Barcodes are quite specific on their own - so a sample file will definitely help.
Can you please send a copy of that expired ticket PDF to support@pdf-xchange.com and we will take a further look?

Regards,
Stefan