Selecting OCR'd Text Selects Entire Page

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
lng
User
Posts: 1
Joined: Thu Jan 19, 2012 8:12 pm

Selecting OCR'd Text Selects Entire Page

Post by lng »

Sometimes when I OCR a document I am not able to accurately select any specific portion of the text that was OCR'd. When I place the cursor in the document and move it ever so slightly (so as to select a few words), that very slight movement of the cursor selects the text in the entire page. That is, all the text on the page is highlighted in blue. I can't just select a portion of the text. If I paste the selected text into a Word document, the text that is pasted is a block of text that starts on the previous page (which I hadn't selected) and ends where I initally inserted the cursor. (The text on the previous page is not shown in blue as having been selected, but it has been selected because it is in the clipboard.)

Usually OCR'ing works fine, allowing me to select any portion of the text that was OCR'd, but sometimes the above happens.

P.S. I'm using PDF Output Type: Preserve Original Content and Add Text Layer.

Any suggestions? Thanks!!
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Selecting OCR'd Text Selects Entire Page

Post by Walter-Tracker Supp »

This is a consequence of the current layout analysis routines which, while pretty good, are not always perfect (no OCR is, really). This results in content being placed in a slightly imperfect way on the page, which confuses the text selection routines.

You can definitely expect improvements in our upcoming release of the next viewer (our new OCR layout analyzer is much better, I have been testing it pretty extensively the last couple of months), but for this version I'm afraid you're stuck with this behaviour. The primary purpose of OCR in the current release is to allow searching - copy & paste of text is just a "bonus" feature, which works in most cases anyway :)
Post Reply