1) Is there really no elegant way to directly extract the values of barcodes contained in a .PDF by means of PDF-Tools? Actually I would expect PDF-Tools to offer a way to extract those values in a similar way we can extract text from a pdf. Maybe enhanced to extract the values of e.g. QR-Codes only...
My workaround is to convert a .PDF to a .PNG, save the file and hand that file over to ZXing to extract the values of the barcodes. Not really elegant... The following Command Line will convert my file:
However this leads to a second question:
2) How do I make the above command to create a .PNG with a resolution of 300 dpi instead of the 150 dpi default? I assume there are additional parameters for the 'pdfToImages', but I can not find any hint about this in the online help information.
Maybe I should have been more specific about my input: My .PDFs are scanned or .pdf received by mail/downloaded - produced by many different authorities - each will have a different structure and look. The only thing they have in common is a QR-Code with a strictly defined content.
There are no form fields on the .pdf so I do not see how 'Export Form Data' could be helpful. In fact I did a quick test it and the generated .fdf did not contain any useful information...
Unfortunately if the barcode is not already a form field, we have no way to extract the value at the moment. I know that some OCR engines are capable of reading and converting barcodes, so I will ask our Dev team if it is possible for our OCR engine to offer this in the future, but for the moment I am afraid that the steps you are taking now are likely the best solution to this problem.
Thank you for your answer. Not sure if barcode decoding fits into the concept of an OCR engine. I think its rather a separate treatment of images... I definitively would like to have it in a separate interface (= own <ToolID>).
Please have a look at my Question 2. My workaround will only work, if I can get a resolution of 300 dpi instead of the 150 dpi default! What parameter do I need to pass on the Command Line to 'pdfToImages' in order to get that resolution?
I'll append my solution to the barcode extraction here as an example as soon as I can complete it - but I need that question to be solved for that! The solution is based on an OpenSource library.
It most certainly is possible and built in to the OCR engine, I actually just confirmed with the Dev team that our OCR already does this, the only caveat is that it does not create visible, searchable, or editable text, it adds a hidden "alternative text" item which is currently only visible to screen readers, as such we are looking at a way to allow copying of that alternative text in the future.
Regarding the image DPI, there is not direct control for this (even via the command line), but if you disable the "compression" options in the "Image to PDF" function (making a change in the UI and running the tool once will save those changes, they will be remembered and used if left unchanged when running via the command line), than the images will be placed within the document at their original resolution/DPI.
Daniel McIntyre - Support Technician
Tracker Software Products (Canada) LTD