Page 1 of 1

OCR is autorotating pages 180 degrees

Posted: Mon Nov 12, 2018 11:14 pm
by pmazurk
We're in the middle of a project that involves creating searchable PDFs out of non-searchable PDFs. We have over 10,000 files to process. We purchased the PDF SDK that includes the OCR module.
Most of our output files are just as we expect them to be. We've had to manually rotate the input with the PDF Exchange Editor, as they were scanned 90 degrees off. When we submit the correctly rotated files and feed them to the OCR engine we get searchable PDFs.

A small number of the output files are being auto rotated 180 degrees. The output OCR'd PDFs have pages that are 180 degrees off from their source files. Searchable text is gibberish as you migh expect.

What could be causing the OCR to flip these pages? I tried a few through the PDF Editor, and that also rotated the pages. We're well into this project and this is a disappointing surprise. Any help would be most appreciated.

Re: OCR is autorotating pages 180 degrees

Posted: Tue Nov 13, 2018 7:14 am
by Sasha - Tracker Dev Team
Hello pmazurk,

Currently we are searching for a solution to this problem.

Cheers,
Alex

Re: OCR is autorotating pages 180 degrees

Posted: Tue Nov 13, 2018 8:54 pm
by pmazurk
Is there a targeted release date? Clearly this is a fundamental issue.

Also - is there a workaround or a configuration change or a switch setting that would stop this? Should I not use Fast Autorotate?

Thanks -

Re: OCR is autorotating pages 180 degrees

Posted: Tue Nov 13, 2018 11:39 pm
by TrackerSupp-Daniel
Hello pmazurk,

We do not have a set release date for this fix in particular, apologies for the inconvenience. Please see my email for more information.

I will leave the Fast Autorotate question for my development colleagues to answer, as I am not a developer, and do not know the answer there.

Re: OCR is autorotating pages 180 degrees

Posted: Fri Nov 16, 2018 5:17 pm
by pmazurk
Is it possible that the OCR operation is creating a new image layer that is rotated while the original layer is not? We've observed that the OCR is actually correct, in that searching for a word finds it in a location that, while incorrect for the visible image layer, would be correct if the image layer were properly rotated.

Also - if this is the case, can we detect and remove the incorrectly rotated image layer?

Re: OCR is autorotating pages 180 degrees

Posted: Sat Nov 17, 2018 6:55 am
by Sasha - Tracker Dev Team
Hello pmazurk,

Please check this thread out - hopefully this can help:
viewtopic.php?f=42&t=31745#p129064

Cheers,
Alex

Re: OCR is autorotating pages 180 degrees

Posted: Tue Nov 20, 2018 11:58 pm
by pmazurk
That worked! Turns out including the PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Image_SuppressOutput option is key. Docs are coming out of the converter as expected. I left the PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Image_FastAutorotate option in, and the pages are being deskewed but not over-rotated.

Thanks for your help-

Re: OCR is autorotating pages 180 degrees

Posted: Wed Nov 21, 2018 8:54 am
by Sasha - Tracker Dev Team
:)