OCR adds another (new) image layer
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
OCR adds another (new) image layer
Hello.
I wonder why OCR adds another (new) image layers on top of the text layer when the option to preserve the original content is used. When I OCR a (scanned) image then I end up with two identical image layers.
I wonder why OCR adds another (new) image layers on top of the text layer when the option to preserve the original content is used. When I OCR a (scanned) image then I end up with two identical image layers.
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR adds another (new) image layer
Hi Timur,
Thanks for the post - I see the same thing here and I'm not entirely sure why that is, but we're currently in the process of re-writing the OCR module completely. It should be released within the next build or two after 320, if all goes well, so there will be some fairly major improvements.
Cheers,
Thanks for the post - I see the same thing here and I'm not entirely sure why that is, but we're currently in the process of re-writing the OCR module completely. It should be released within the next build or two after 320, if all goes well, so there will be some fairly major improvements.
Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
Re: OCR adds another (new) image layer
Hi Will,
thanks for getting back to me. I will wait and look what the new OCR module brings.
thanks for getting back to me. I will wait and look what the new OCR module brings.
- Patrick-Tracker Supp
- Site Admin
- Posts: 1645
- Joined: Thu Mar 27, 2014 6:14 pm
- Location: Vancouver Island
- Contact:
Re: OCR adds another (new) image layer
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Cheers,
Patrick Charest
Tracker Support North America
Thank you.
Cheers,
Patrick Charest
Tracker Support North America
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
Re: OCR adds another (new) image layer
While you are it: Please allow for Editor to me minimized while OCR is running. Currently only the OCR popup can be minimized, but not the main window.
- Tracker Supp-Stefan
- Site Admin
- Posts: 17906
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: OCR adds another (new) image layer
Hello Timur,
Thanks for this suggestion. I will bring it up for discussion on the next meeting!
Regards,
Stefan
Thanks for this suggestion. I will bring it up for discussion on the next meeting!
Regards,
Stefan
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
Re: OCR adds another (new) image layer
Out of curiosity: Do the upcoming changes to OCR also include better character recognition? Acrobat DC is kind of unbeatable in this department, even by specialized software like Omnipage and FineReader, but having more reliable OCR in XChange would be a nice bonus.
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR adds another (new) image layer
Hi Timur,
It should do yes - The OCR re-write is going to be fairly heavy and comprihensive, but I can't say how major the improvement will be, or what specifically will be in the initial release of the re-write for now.
Cheers,
It should do yes - The OCR re-write is going to be fairly heavy and comprihensive, but I can't say how major the improvement will be, or what specifically will be in the initial release of the re-write for now.
Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Re: OCR adds another (new) image layer
Xchange Editor V6 (build 321).
The output option to recreate a new document, or add a text layer (Document - OCR Pages) is not available in the workflow File - New Document (Image Post Processing).
The output option to recreate a new document, or add a text layer (Document - OCR Pages) is not available in the workflow File - New Document (Image Post Processing).
- Tracker Supp-Stefan
- Site Admin
- Posts: 17906
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: OCR adds another (new) image layer
Hello Mitch,
I just tested it - and the OCR is still there as expected. Please note that you need to click the "OCR" check box for the button to select the languages to become active: Regards,
Stefan
I just tested it - and the OCR is still there as expected. Please note that you need to click the "OCR" check box for the button to select the languages to become active: Regards,
Stefan
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
Re: OCR adds another (new) image layer
Any news when the new OCR implementation will arrive? I just tried to OCR a document where the current OCR would turn separated (italic) words into one big word, regardless of accuracy settings.
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR adds another (new) image layer
Hi Timur,
The new OCR is slated for release with Version 7, so not until roughly September (date obviously subject to change depending, on content etc.).
Cheers,
The new OCR is slated for release with Version 7, so not until roughly September (date obviously subject to change depending, on content etc.).
Just tried to reproduce with a document created here and wasn't able to. Can you send a sample doc.?I just tried to OCR a document where the current OCR would turn separated (italic) words into one big word, regardless of accuracy settings.
Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
Re: OCR adds another (new) image layer
1. I send you a file where Editor's OCR turns *every* line of every paragraph into single words without spaces.
2. This also is a good example of how "Copy as Rich Text" (of the original text already present) shows serious weaknesses. Most text pages are turned into empty pages with text being written one single character per line! This is why I even tried to OCR these files to begin with, even though they already provided text. Compare that to the copy & paste results of Adobe Reader and you will see that it is a night & day difference.
2. This also is a good example of how "Copy as Rich Text" (of the original text already present) shows serious weaknesses. Most text pages are turned into empty pages with text being written one single character per line! This is why I even tried to OCR these files to begin with, even though they already provided text. Compare that to the copy & paste results of Adobe Reader and you will see that it is a night & day difference.
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR adds another (new) image layer
Hi Timur,
Thanks for that - I works perfectly for me here. The copy/paste results are identical between the Editor and Adobe for me (both before and after OCR), and the OCR results don't differ from the original. Can you advise on your OCR settings and/or walk me through step-by-step?
Also, do you see this on every single page, or on specific pages?
Thanks,
Thanks for that - I works perfectly for me here. The copy/paste results are identical between the Editor and Adobe for me (both before and after OCR), and the OCR results don't differ from the original. Can you advise on your OCR settings and/or walk me through step-by-step?
Also, do you see this on every single page, or on specific pages?
Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR adds another (new) image layer
Just thought to test pasting into MS word to see the rich text difference in a better light - I do actually see the copy/paste difference between the Editor and Adobe, so I'll pass that along (ticket RT-3978).
However, I still don't see the OCR issue that you mentioned.
However, I still don't see the OCR issue that you mentioned.
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
Re: OCR adds another (new) image layer
Copy & Paste:
- If you mark parts of page 4 and page 5 together then the resulting paste of page 5 gets turned into one character lines. For my original test I used CTRL-A to copy all text, but this time I manually selected text on page 5, page 4+5 and page 5+6. The problem only happens when page 4+5 are selected, but not with the combination of page 5+6. With the latter you rather get blank pages in between when the rich text is copied to Word.
- If you copy one or both columns of page 5 from Editor to Word as rich text then the formatting is all over the place. If you copy the same from Reader then you get a single coherent column in Word (Adobe Acrobat's full version offers a third option that keeps the original formatting intact, including the running text around the center image).
OCR:
Here is an original paragraph from page 5:
"You’re starting the Strange Aeons Adventure Path, but
what kind of character should you play? How much
should you develop your character’s backstory, knowing
that the characters can’t remember much of their past?"
Here is the OCR version (Language English, Accuracy medium):
"You’restartingtheStrangeAeonsAdventurePath,but
whatkindofcharactershouldyouplay?Howmuch
shouldyoudevelopyourcharacter’sbackstory,knowing
thatthecharacterscan’tremembermuchoftheirpast?"
- If you mark parts of page 4 and page 5 together then the resulting paste of page 5 gets turned into one character lines. For my original test I used CTRL-A to copy all text, but this time I manually selected text on page 5, page 4+5 and page 5+6. The problem only happens when page 4+5 are selected, but not with the combination of page 5+6. With the latter you rather get blank pages in between when the rich text is copied to Word.
- If you copy one or both columns of page 5 from Editor to Word as rich text then the formatting is all over the place. If you copy the same from Reader then you get a single coherent column in Word (Adobe Acrobat's full version offers a third option that keeps the original formatting intact, including the running text around the center image).
OCR:
Here is an original paragraph from page 5:
"You’re starting the Strange Aeons Adventure Path, but
what kind of character should you play? How much
should you develop your character’s backstory, knowing
that the characters can’t remember much of their past?"
Here is the OCR version (Language English, Accuracy medium):
"You’restartingtheStrangeAeonsAdventurePath,but
whatkindofcharactershouldyouplay?Howmuch
shouldyoudevelopyourcharacter’sbackstory,knowing
thatthecharacterscan’tremembermuchoftheirpast?"
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR adds another (new) image layer
Hi Timur,
I've added that to the ticket - I see exactly what you mean and will admit, the result isn't pretty.Copy & Paste:
- If you mark parts of page 4 and page 5 together then the resulting paste of page 5 gets turned into one character lines. For my original test I used CTRL-A to copy all text, but this time I manually selected text on page 5, page 4+5 and page 5+6. The problem only happens when page 4+5 are selected, but not with the combination of page 5+6. With the latter you rather get blank pages in between when the rich text is copied to Word.
- If you copy one or both columns of page 5 from Editor to Word as rich text then the formatting is all over the place. If you copy the same from Reader then you get a single coherent column in Word (Adobe Acrobat's full version offers a third option that keeps the original formatting intact, including the running text around the center image).
I don't see that here - I've attached a Word doc. that shows my results: I get identical results with the Create New Searchable PDF and Preserve Original Content... options and I used medium accuracy & English. Is there something that I'm doing different?OCR:
Here is an original paragraph from page 5:
"You’re starting the Strange Aeons Adventure Path, but
what kind of character should you play? How much
should you develop your character’s backstory, knowing
that the characters can’t remember much of their past?"
Here is the OCR version (Language English, Accuracy medium):
"You’restartingtheStrangeAeonsAdventurePath,but
whatkindofcharactershouldyouplay?Howmuch
shouldyoudevelopyourcharacter’sbackstory,knowing
thatthecharacterscan’tremembermuchoftheirpast?"
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
Re: OCR adds another (new) image layer
Looks like a case of different "Copy white space mode" settings between our setups. I suspect that you use proportional white space, because there are some places in your Word example where more than one white space separates two words. I use "Preserve original" myself, so that's that.
Distance words proportionally:
Only one white space between words:
Preserve original white spaces only:
The latter is the one that reveals that OCR removes white spaces from the original.
Distance words proportionally:
Code: Select all
While many of the same options that make great
characters in any Adventure Path work well for this
campaign, a few class options are especially suited
to a campaign where the characters struggle against
Code: Select all
While many of the same options that make great
characters in any Adventure Path work well for this
campaign, a few class options are especially suited
to a campaign where the characters struggle against
Code: Select all
Whilemanyofthesameoptionsthatmakegreat
charactersinanyAdventurePathworkwellforthis
campaign,afewclassoptionsareespeciallysuited
toacampaignwherethecharactersstruggleagainst
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR adds another (new) image layer
Thanks Timur, I was using a different "Copy white space mode" - sorry, I've been away for 2 weeks and am still getting back into the 'swing of things'
I've reproduced this here and passed it along via a ticket (RT-3984).
Thanks,
I've reproduced this here and passed it along via a ticket (RT-3984).
Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
Re: OCR adds another (new) image layer
This seems to have been fixed in the meantime?!Timur Born wrote:I wonder why OCR adds another (new) image layers on top of the text layer when the option to preserve the original content is used. When I OCR a (scanned) image then I end up with two identical image layers.
It's worth mentioning that "Edit Content" always uses a single white space regardless of what is in the Copy Text options. I don't know if this is wanted behavior or not?!Will - Tracker Support wrote:Thanks Timur, I was using a different "Copy white space mode" - sorry, I've been away for 2 weeks and am still getting back into the 'swing of things'
I've reproduced this here and passed it along via a ticket (RT-3984).
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR adds another (new) image layer
Hi Timur,
Thanks,
Awesome, glad to hear that!This seems to have been fixed in the meantime?!
I believe that this is by design. Copying text using the Select Text Tool is more difficult than it would visually appear, because spaces as such do not exist in PDF files. We have to determine what constitutes as a space by looking at the gap between characters, rather than actually looking for a space character. But I believe, that when text is being edited, it's we're temporarily able to act like a word processor (i.e. Notepad, WordPad, etc.) that includes space characters, so it's much easier for us to copy text and those options are no longer necessary. I'll need to double check to confirm that, but the developer responsible is on holiday so I won't hear back for a little while.It's worth mentioning that "Edit Content" always uses a single white space regardless of what is in the Copy Text options. I don't know if this is wanted behavior or not?!
Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
- User
- Posts: 874
- Joined: Tue Jun 26, 2012 1:50 pm
Re: OCR adds another (new) image layer
He will still have to look into the "Preserve original white spaces" bug for the normal (non edit) text selection tool then. And the presence of this option seems to suggest that there are original white spaces in PDFs?! At least Adobe Reader has no problems copying these.
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR adds another (new) image layer
Absolutely!He will still have to look into the "Preserve original white spaces" bug for the normal (non edit) text selection tool then.
Not exactly - the UI text is likely to phrased to make more sense to users that wouldn't otherwise understand.And the presence of this option seems to suggest that there are original white spaces in PDFs?! At least Adobe Reader has no problems copying these.
Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com