Content pane data after OCR: hierarchy and editing

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
DIV
User
Posts: 252
Joined: Fri Jun 23, 2017 1:47 am

Content pane data after OCR: hierarchy and editing

Post by DIV »

After completing OCR on a scanned document, the Content pane contains a hierarchy of many items comprising an image (or images) and numerous text objects.

Initially the text objects were arranged as words (with punctuation & spaces) grouped by line (one group for each line on the page/column).
However, when I edited the text (blind!) using the Edit Content button, the grouping was changed. My document had two columns, and all text within the column I edited became organised into one single group; the text in the other (unedited) column remained organised in many groups (one per line), and likewise the grouping of the (unedited) text in the header was unchanged. Note that my editing consisted of overwriting just two characters, being from two words on a single line.
I'm just curious: if one of those types of grouping is better than the other, then why not always use the superior grouping?

Besides editing blind, and rather than changing colours and hiding the image, is there a compelling reason not to be able to edit the text directly in the Content pane? (And have such a feature available even for non OCR'ed PDF files?)
From the user's perspective this could presumably operate something like the the editing of bookmark names in the Bookmarks pane.
image.png
—DIV
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17901
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Content pane data after OCR: hierarchy and editing

Post by Tracker Supp-Stefan »

Hello DIV,

The line by line text is created by the OCR. That's how it prefers to work. When you "Edit text elements as blocks" later - we apply the logic and grouping of that tool - it has to group the base content objects as well.

As for why you can't edit text directly in the contents pane - it won't show you the text formatting - so I do not see this as happening any time soon. Even if the majority of text will likely stay as a single font, size and colour and you will like your edits to preserve the same, it is possible that you have each character a separate font, colour and size, so it's better if you see the edits in the main page rendering area while making them.

Kind regards,
Stefan
Post Reply