OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

PDF-XChange Drivers API (only) V4/V5
This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-XChange Printer Drivers SDK (only) - VERSION 4 & 5 - Please use the PDF-Tools SDK Forum for Library DLL assistance.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
baumunk
User
Posts: 38
Joined: Fri Nov 13, 2020 8:47 am

OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by baumunk »

code:

Code: Select all

PDFPrinter.SetAsDefaultPrinter();
            PDFPrinter.Option["Save.ShowSaveDialog"] = "False";
            PDFPrinter.Option["Save.File"] = pdfaFile;
            PDFPrinter.Option["Saver.ShowProgress"] = "False";
            PDFPrinter.Option["General.PageLayout"] = "ShowNone";
            PDFPrinter.Option["General.HideUI"] = "True";
            PDFPrinter.Option["General.FullScreenMode"] = "ShowNone";
            PDFPrinter.Option["General.Specification"] = "-1"; 
            PDFPrinter.Option["Save.RunApp"] = "False";
            PDFPrinter.Option["Save.WhenExists"] = "Overwrite";
            PDFPrinter.SetRegInfo(dec_key);
            var printJob = new System.Diagnostics.Process
            {
                StartInfo = new ProcessStartInfo(pdfAppName)
                {
                    FileName = pdfFile,
                    Verb = "print",
                    WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden,
                    CreateNoWindow = true
                }
            };
            printJob.Start();[attachment=0]sample_pages_ocr_PDFA-1b.pdf[/attachment]
Files:

sample_pages_ocr.pdf after OCR

sample_pages_ocr_PDFA-1b.pdf After PDF/A (OCR is gone).
Attachments
sample_pages_ocr_PDFA-1b.pdf
(88.6 KiB) Downloaded 141 times
sample_pages_ocr.pdf
(120.8 KiB) Downloaded 151 times
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6813
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by Paul - Tracker Supp »

HI baumunk,

I am not sure how this applies to the SDK, but when I take your OCR's PDF and save it as PDF/A-1b there is an option to "Rasterize unembedded fonts" if I turn that off I get a PDF/A-1b where the text can be selected.
image.png
sample_pages_ocr_PDF-1b-Paul.pdf
(156.79 KiB) Downloaded 160 times
Are you able to do a similar thing via your code? If not let me know and I will ask one of the devs to take a look at this.
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
baumunk
User
Posts: 38
Joined: Fri Nov 13, 2020 8:47 am

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by baumunk »

Hello Paul,

I have not found anything about SDK:
https://help.pdf-xchange.com/pdfxdapi9sdk/
I have only These options (as a mask):
Options.JPG
Please ask the developers how I can achieve this.
We must have this.

With kind regards
Ernest Baumunk
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6813
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by Paul - Tracker Supp »

I spoke to one of the dev team about this.

The reason your result does not include the selectable text is that your original document has invisible text from the OCR. The Editor can select it and save as PDF/A-1b and retain the selectability of the text. Printing this invisible text results in nothing printed for that text which is why it cannot be selected.

The long and short of it is that reprinting is the root of the issue. You should use the Editor and/or Editor SDK to convert to PDF/A-1b not the printer.

This is a failing with any printer, not just ours. You already have an OCR'd PDF. Why reprint that and loose data? Better to just convert the PDF to PDF/A without using the printer. If you have large numbers of PDFs that yo need to convert to PDF/A I suggest using PDF-Tools to batch the process.
image.png
Both the Editor and PDF-Tools can do this for you without resorting to reprinting, so the editor SDK.

I hope that helps.
sample_pages_ocr_Editor.pdf
(156.79 KiB) Downloaded 151 times
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
baumunk
User
Posts: 38
Joined: Fri Nov 13, 2020 8:47 am

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by baumunk »

Hello O'Rorke

Do you think:
PDF-XChange Editor SDK
https://www.pdf-xchange.com/product/pdf-xchange-editor-sdk

or
PDF-XChange Editor Simple SDK
https://www.pdf-xchange.com/product/pdf-xchange-editor-simple-sdk

Best regards
Ernest Baumunk
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by Sasha - Tracker Dev Team »

Hello baumunk,

PDF-XChange Editor SDK is the one that you should use.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Post Reply