OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

PDF-XChange Drivers API (only) V4/V5
This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-XChange Printer Drivers SDK (only) - VERSION 4 & 5 - Please use the PDF-Tools SDK Forum for Library DLL assistance.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Chris - Tracker Supp, Tracker Supp-Stefan

Post Reply
baumunk
User
Posts: 15
Joined: Fri Nov 13, 2020 8:47 am

OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by baumunk » Mon Feb 15, 2021 1:03 pm

code:

Code: Select all

PDFPrinter.SetAsDefaultPrinter();
            PDFPrinter.Option["Save.ShowSaveDialog"] = "False";
            PDFPrinter.Option["Save.File"] = pdfaFile;
            PDFPrinter.Option["Saver.ShowProgress"] = "False";
            PDFPrinter.Option["General.PageLayout"] = "ShowNone";
            PDFPrinter.Option["General.HideUI"] = "True";
            PDFPrinter.Option["General.FullScreenMode"] = "ShowNone";
            PDFPrinter.Option["General.Specification"] = "-1"; 
            PDFPrinter.Option["Save.RunApp"] = "False";
            PDFPrinter.Option["Save.WhenExists"] = "Overwrite";
            PDFPrinter.SetRegInfo(dec_key);
            var printJob = new System.Diagnostics.Process
            {
                StartInfo = new ProcessStartInfo(pdfAppName)
                {
                    FileName = pdfFile,
                    Verb = "print",
                    WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden,
                    CreateNoWindow = true
                }
            };
            printJob.Start();[attachment=0]sample_pages_ocr_PDFA-1b.pdf[/attachment]
Files:

sample_pages_ocr.pdf after OCR

sample_pages_ocr_PDFA-1b.pdf After PDF/A (OCR is gone).
Attachments
sample_pages_ocr_PDFA-1b.pdf
(88.6 KiB) Downloaded 4 times
sample_pages_ocr.pdf
(120.8 KiB) Downloaded 6 times

User avatar
Paul - Tracker Supp
Site Admin
Posts: 5183
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by Paul - Tracker Supp » Tue Feb 16, 2021 5:18 pm

HI baumunk,

I am not sure how this applies to the SDK, but when I take your OCR's PDF and save it as PDF/A-1b there is an option to "Rasterize unembedded fonts" if I turn that off I get a PDF/A-1b where the text can be selected.
image.png
sample_pages_ocr_PDF-1b-Paul.pdf
(156.79 KiB) Downloaded 5 times
Are you able to do a similar thing via your code? If not let me know and I will ask one of the devs to take a look at this.
_________________
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com

baumunk
User
Posts: 15
Joined: Fri Nov 13, 2020 8:47 am

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by baumunk » Wed Feb 17, 2021 7:17 am

Hello Paul,

I have not found anything about SDK:
https://help.tracker-software.com/pdfxdapi9sdk/
I have only These options (as a mask):
Options.JPG
Please ask the developers how I can achieve this.
We must have this.

With kind regards
Ernest Baumunk

User avatar
Paul - Tracker Supp
Site Admin
Posts: 5183
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by Paul - Tracker Supp » Wed Feb 17, 2021 9:13 pm

I spoke to one of the dev team about this.

The reason your result does not include the selectable text is that your original document has invisible text from the OCR. The Editor can select it and save as PDF/A-1b and retain the selectability of the text. Printing this invisible text results in nothing printed for that text which is why it cannot be selected.

The long and short of it is that reprinting is the root of the issue. You should use the Editor and/or Editor SDK to convert to PDF/A-1b not the printer.

This is a failing with any printer, not just ours. You already have an OCR'd PDF. Why reprint that and loose data? Better to just convert the PDF to PDF/A without using the printer. If you have large numbers of PDFs that yo need to convert to PDF/A I suggest using PDF-Tools to batch the process.
image.png
Both the Editor and PDF-Tools can do this for you without resorting to reprinting, so the editor SDK.

I hope that helps.
sample_pages_ocr_Editor.pdf
(156.79 KiB) Downloaded 2 times
_________________
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com

baumunk
User
Posts: 15
Joined: Fri Nov 13, 2020 8:47 am

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by baumunk » Thu Feb 18, 2021 7:02 am

Hello O'Rorke

Do you think:
PDF-XChange Editor SDK
https://www.tracker-software.com/product/pdf-xchange-editor-sdk

or
PDF-XChange Editor Simple SDK
https://www.tracker-software.com/product/pdf-xchange-editor-simple-sdk

Best regards
Ernest Baumunk

User avatar
Sasha - Tracker Dev Team
User
Posts: 5299
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Post by Sasha - Tracker Dev Team » Thu Feb 18, 2021 9:24 am

Hello baumunk,

PDF-XChange Editor SDK is the one that you should use.

Cheers,
Alex
Join us at Google+:
https://plus.google.com/+PDFXChangeEditorTS
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ

Post Reply