Search found 381 matches
- Thu Jan 30, 2014 5:17 pm
- Forum: PDF-XChange (End Users FR)
- Topic: Installing Ocr Arabic on PDF Xchange 3.0
- Replies: 13
- Views: 533
Re: Installing Ocr Arabic on PDF Xchange 3.0
Arabic uses a special OCR engine mode, and a bunch of additional support language files. This functionality has not been implemented in the viewer. Sorry.
- Mon Jan 13, 2014 11:25 pm
- Forum: PDF-XChange Editor
- Topic: Problem with OCR text layer alignment/orientation
- Replies: 2
- Views: 1420
Re: Problem with OCR text layer alignment/orientation
Do you run OCR with "Document->OCR Pages..." command, or from New Document -> From Images (or From Scanner) with the OCR post-processing enabled?
- Fri Jan 10, 2014 5:19 pm
- Forum: PDF-X OCR SDK
- Topic: Very large file size after the text recognition
- Replies: 23
- Views: 10617
Re: Very large file size after the text recognition
As I mentioned, we'll be providing better compression in the next major release. There are some constraints on the compression methods we use in the current release that can result in larger files in some cases. Meanwhile I am investigating the file provided most recently in this thread - there may ...
- Mon Dec 30, 2013 6:11 pm
- Forum: PDF-XChange Viewer (End Users)
- Topic: How do I specify Duplex from the command line?
- Replies: 6
- Views: 3195
Re: How do I specify Duplex from the command line?
Unfortunately there doesn't appear to be a simple way to duplicate printers in Windows 8 that I can find, and when I installed a second printer copy manually, Windows 8 "merged" it so that I only had one copy. It appears Windows 8 is trying to protect against "clutter" and I'm no...
- Tue Dec 24, 2013 8:18 pm
- Forum: PDF-XChange Viewer (End Users)
- Topic: Viewer 2.5.213.0 and 2.5.213.1 version information
- Replies: 22
- Views: 10472
Re: Viewer 2.5.213.0 and 2.5.213.1 version information
Thanks for your patience (and for the reminders!) -Walter Hi lmacri, Yes, it's almost two months since Viewer build 2.5.213.0 was issued, almost six weeks since 2.5.213.1, four weeks since I first asked about updating the Viewer's Version History page, and two weeks since I asked again, bus as today...
- Wed Dec 04, 2013 11:17 pm
- Forum: PDF-X OCR SDK
- Topic: SDK to extract text, then search
- Replies: 12
- Views: 17585
Re: SDK to extract text, then search
Yes, we use the __stdcall calling convention. You do not need to purchase the product to try it; there are some limitations (e.g. watermarks if you create documents, limits on the number of pages you can OCR, etc) but you can try every feature out without purchasing a license.
-Walter
-Walter
- Wed Dec 04, 2013 5:21 pm
- Forum: PDF-X OCR SDK
- Topic: SDK to extract text, then search
- Replies: 12
- Views: 17585
Re: SDK to extract text, then search
You can do this with the Pro Tools SDK, but it is not active-X but rather native C++ DLL with a flat C-style API. We have functions to extract existing text, and an OCR component that lets you perform OCR and create either a searchable PDF output, or extract text which you can save to a text file if...
- Mon Nov 25, 2013 7:01 pm
- Forum: PDF-X OCR SDK
- Topic: Low performance of the OCR_MakeSearchable method
- Replies: 2
- Views: 3575
Re: Low performance of the OCR_MakeSearchable method
I've looked at your document, and while I don't see nearly the poor performance you do, I do note that it takes longer than typical files. You will notice that pages 8 and 9 are the culprits, and this is because the layout of those pages are difficult for our engine to process, due to the complexity...
- Mon Nov 11, 2013 5:08 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Latin and Gothic Letters for OCR
- Replies: 15
- Views: 12708
Re: Latin and Gothic Letters for OCR
Croatian will be available on or before the next build, anticipated in about a month's time. Meanwhile you can use any other language we provide which uses the same diacritics, if applicable (I'm not familiar with Croatian myself), because the word dictionary coupling is weak. I will update this for...
- Thu Oct 31, 2013 10:32 pm
- Forum: PDF-XChange Editor
- Topic: Snapshots Always Paste at 72 DPI
- Replies: 1
- Views: 1316
Re: Snapshots Always Paste at 72 DPI
You are right, it copies at the correct DPI, but does not set the information in such a way that the clipboard retains the DPI setting.
It has been fixed in development as of right now and will be in the next release.
It has been fixed in development as of right now and will be in the next release.
- Thu Oct 24, 2013 5:01 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Latin and Gothic Letters for OCR
- Replies: 15
- Views: 12708
Re: Latin and Gothic Letters for OCR
Not at the moment. We may release a tool to help with training in the future. However, if you feel ambitious you can email us at support@pdf-xchange.com and I can point you in the right direction, but can't provide detailed support for it - you'd be on your own.
- Tue Oct 22, 2013 6:19 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Latin and Gothic Letters for OCR
- Replies: 15
- Views: 12708
Re: Latin and Gothic Letters for OCR
Ludwig, I have attached the language pack to this post, because I guess it will still be a few days since our installer people are very busy with the new editor release. You will have to place them in your language directory yourself, and we cannot provide support for this since we will have a prope...
- Sat Oct 19, 2013 12:09 am
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Latin and Gothic Letters for OCR
- Replies: 15
- Views: 12708
Re: Latin and Gothic Letters for OCR
Ludwig, I have prepared the Fraktur language pack and sent it to our installation guys. It may be a few days before it becomes available on the website but I thought I would update you to let you know that it will be very soon. It will work with both the viewer and the editor.
-Walter
-Walter
- Wed Oct 16, 2013 4:53 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Latin and Gothic Letters for OCR
- Replies: 15
- Views: 12708
Re: Latin and Gothic Letters for OCR
We will add Slovakian, Swedish, and German "fraktur" language data in the final release of the editor. We will not have direct Latin support, though results using English (or even other Latin alphabet) language selection will be fairly good since the word dictionary weighting is fairly wea...
- Wed Oct 16, 2013 4:30 pm
- Forum: PDF-X OCR SDK
- Topic: Very large file size after the text recognition
- Replies: 23
- Views: 10617
Re: Very large file size after the text recognition
We are providing this kind of functionality in the new SDK which will be out after the editor is finalized.
-Walter
-Walter
- Wed Oct 16, 2013 4:29 pm
- Forum: PDF-X OCR SDK
- Topic: Only 2 pages will be generated with full version
- Replies: 8
- Views: 5676
Re: Only 2 pages will be generated with full version
I'm afraid I am unable to reproduce this problem with either the 32bit and 64bit DLLs (directly from the link provided by Stefan to the current live DLLs). Your key will still be valid with them, so I'm not sure what's going on. Have you changed any of your code? Can you try the sample applications?...
- Fri Sep 20, 2013 10:01 pm
- Forum: PDF-XChange Viewer SDK
- Topic: What Version of the OCR Module is Included?
- Replies: 3
- Views: 1817
Re: What Version of the OCR Module is Included?
Oh, if you mean the Active-X Viewer component, it contains the latest OCR available for that range of products (Viewer and PDF-Tools SDK).
We have an improved underlying engine in the *Editor* product and upcoming related lines.
We have an improved underlying engine in the *Editor* product and upcoming related lines.
- Fri Sep 20, 2013 9:43 pm
- Forum: PDF-XChange Viewer SDK
- Topic: What Version of the OCR Module is Included?
- Replies: 3
- Views: 1817
Re: What Version of the OCR Module is Included?
You can check the version of the dll within Windows' file explorer. Just right-click the dll and select "properties" from the context menu, then select the "Details" tab. Version is indicated in the "File version" field. And yes, the SDK for download is always up to dat...
- Wed Sep 11, 2013 5:30 pm
- Forum: PDF-X OCR SDK
- Topic: OCR for mixed language documents
- Replies: 5
- Views: 4251
Re: OCR for mixed language documents
There is some limited support for recognition of mixed Chinese (traditional or simplified) with latin script. I would recommend you try the free PDF-XChange Viewer (from our downloads page) with the Chinese language package and try it out on some sample documents, as it uses the same underlying OCR ...
- Tue Sep 10, 2013 8:13 pm
- Forum: PDF-XChange Viewer SDK
- Topic: Links to other PDF documents
- Replies: 2
- Views: 1582
Re: Links to other PDF documents
You say it crashes unless you add the a sleep statement; what is the nature of the crash?
What is your OpenPdfFile() function?
What is your OpenPdfFile() function?
- Thu Sep 05, 2013 6:44 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: OCR - any way of accessing the text overlay as a .txt doc?
- Replies: 3
- Views: 4006
Re: OCR - any way of accessing the text overlay as a .txt do
OCR text is essentially the same as visible text, except that it is not rendered. You can extract text by selecting it with the mouse, and copying / pasting, or you can use the Viewer's javascript provisions. I have attached a simple script that extracts text from the current page and outputs it to ...
- Tue Sep 03, 2013 4:10 pm
- Forum: PDF-XChange Editor
- Topic: "Preserve Original Content and Add Text Layer" rotated
- Replies: 7
- Views: 3254
Re: "Preserve Original Content and Add Text Layer" rotated
Thank you - it will be fixed shortly.
-Walter
-Walter
- Mon Aug 19, 2013 6:28 pm
- Forum: PDF-XChange Viewer SDK
- Topic: Anyway to programatically hide the bookmark icons
- Replies: 3
- Views: 2100
Re: Anyway to programatically hide the bookmark icons
There is a named command called "ToggleBookmarksPane" that will do what you want to do. You run commands by invoking the method DoVerb() of the ActiveX object. DoVerb() does a few things (see "Named Operations" in the manual), but for your purpose you will want to use it to invok...
- Fri Jun 07, 2013 4:25 pm
- Forum: PDF-X OCR SDK
- Topic: Reusing OCR results
- Replies: 8
- Views: 5667
Re: Reusing OCR results
The OCR SDK does not support this; OCR results persist after an OCR job is performed and until you free the OCR document object with OCR_Delete(), but they cannot be directly recovered (e.g. into a PXO_Page object) from an already OCR'd document on disk. You can extract text from documents using the...
- Thu Jun 06, 2013 8:52 pm
- Forum: PDF-X OCR SDK
- Topic: PDF library crash
- Replies: 78
- Views: 35054
Re: PDF library crash
This comes from a bug in the OCR engine which was throwing an unhandled exception. For the moment the best solution is for us to handle the exception internally and return an error code so you can handle the failure gracefully. I will provide a new build for download on the website shortly (version ...
- Thu Jun 06, 2013 8:15 pm
- Forum: PDF-X OCR SDK
- Topic: PDF library crash
- Replies: 78
- Views: 35054
Re: PDF library crash
Thanks, will investigate.
- Wed Jun 05, 2013 9:18 pm
- Forum: PDF-X OCR SDK
- Topic: PDF library crash
- Replies: 78
- Views: 35054
Re: PDF library crash
Can you give some details about the nature of the crash?
What DPI were you setting for OCR? What error or exception was returned?
-Walter
What DPI were you setting for OCR? What error or exception was returned?
-Walter
- Wed Jun 05, 2013 8:45 pm
- Forum: PDF-X OCR SDK
- Topic: PDF library crash
- Replies: 78
- Views: 35054
Re: PDF library crash
Many thanks
- Tue Jun 04, 2013 5:43 pm
- Forum: PDF-X OCR SDK
- Topic: Reusing OCR results
- Replies: 8
- Views: 5667
Re: Reusing OCR results
Also, the results of OCR_MakeSearchable() remain valid until the document is freed with OCR_Delete(). You can work with multiple documents by creating multiple input documents with OCR_Init() and OCR_Load()/OCR_LoadW(), e.g.: In pseudocode: // ocr first document PXODocument doc1; OCR_Init(..., doc1,...
- Tue Jun 04, 2013 4:27 pm
- Forum: PDF-X OCR SDK
- Topic: Reusing OCR results
- Replies: 8
- Views: 5667
Re: Reusing OCR results
Yes, you can re-use results; the function OCRp_Page() will return a pointer to page information that remains valid until explicitly freed with OCRp_FreePage(). So, in psuedocode: PXO_Page pages[40]; for (nPage in range(40)) OCRp_Page(doc, nPage, options, &pages[nPage], &settings); DoStuff(pa...
- Tue Jun 04, 2013 4:13 pm
- Forum: PDF-X OCR SDK
- Topic: Different outputs OCRp_PageText and OCRp_GetSymbolFromRegion
- Replies: 14
- Views: 7246
Re: Different outputs OCRp_PageText and OCRp_GetSymbolFromRe
The encoded text is in UTF-8 which is a variable width encoding (1 byte "backwards compatibility" mode for ASCII chars, 2 bytes for non-ASCII unicode). I'd make sure you're working with UTF-8 during your conversions.
- Mon Jun 03, 2013 8:07 pm
- Forum: PDF-XChange Editor
- Topic: Editing "Text Under Image" PDF Files
- Replies: 11
- Views: 4157
Re: Editing "Text Under Image" PDF Files
No problem!BobM wrote:Walter - thanks for the feedback and clarification.
Please note that the feedback I gave was more general; the problems you saw are directly related to the bug which has been resolved now (will be in the release available most likely by the end of today).
- Fri May 31, 2013 10:39 pm
- Forum: PDF-XChange Editor
- Topic: Editing "Text Under Image" PDF Files
- Replies: 11
- Views: 4157
Re: Editing "Text Under Image" PDF Files
The bug has been resolved and the fix be present in the next release version of the editor and OCR Plugin (next week, I believe, although I'm not responsible for the release schedule so I may defer to someone else to weigh in on that one). However I would like to point out that your document "s...
- Fri May 31, 2013 4:14 pm
- Forum: PDF-XChange Editor
- Topic: Editing "Text Under Image" PDF Files
- Replies: 11
- Views: 4157
Re: Editing "Text Under Image" PDF Files
Thanks; this appears to be a bug in the handling of certain types of page layout, not directly related to OCR but definitely having a big impact as it results in incorrect page orientations being passed to the OCR routines. It will most likely be addressed in the next build (probably a week or so aw...
- Thu May 30, 2013 5:39 pm
- Forum: How to forum
- Topic: Proximity Searches
- Replies: 4
- Views: 4054
Re: Proximity Searches
When is version 3 going to be released? Will there ever be a python API for PDF exchange? I want to punch myself in the face when I read java. def prioritize(featureitem): if featureitem=="pythonAPI": return sys.maxint else: return normalrank(featureitem) def development_meeting(): ranks ...
- Thu May 30, 2013 4:04 pm
- Forum: PDF-XChange Editor
- Topic: Editing "Text Under Image" PDF Files
- Replies: 11
- Views: 4157
Re: Editing "Text Under Image" PDF Files
If you could provide an example document for us to examine we would appreciate it. In our testing, the OCR in the editor has much better layout analysis and generally gives higher quality results than the viewer. In particular, differentiation of text and image regions is much better and the overall...
- Fri May 17, 2013 7:18 pm
- Forum: PDF-X OCR SDK
- Topic: OCRTOOLS - OCR_MakeSearchable error -1039334424
- Replies: 6
- Views: 8521
Re: OCRTOOLS - OCR_MakeSearchable error -1039334424
Hi, Please try the latest DLL first - the current version is 1.0.13. There is a table of error codes in the SDK manual which I do not have at my disposal right now (I'm on a mobile device, away from work for the time being) - maybe this will point you in the right direction? You might also try our D...
- Wed May 08, 2013 4:18 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Searching OCRed text in Adobe Reader
- Replies: 15
- Views: 8814
Re: Searching OCRed text in Adobe Reader
Walter, Yes, very nice! There were just the 4 instances of "motley" detected as desired. Searches for other words also performed as expected. Also, no extraneous characters appeared in the searches - one of my original reasons for posting on this forum. Bottom line - time to get the PDF-X...
- Tue May 07, 2013 10:46 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Searching OCRed text in Adobe Reader
- Replies: 15
- Views: 8814
Re: Searching OCRed text in Adobe Reader
Here's the output using our new OCR layout engine from the Editor. Can you confirm that you can also find all 4 instances of "Motley" using your reader(s) of choice? If you select text of this one, and the OCR output from the Viewer (provided by you already), you should be able to clearly ...
- Tue May 07, 2013 10:40 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Searching OCRed text in Adobe Reader
- Replies: 15
- Views: 8814
Re: Searching OCRed text in Adobe Reader
Thanks for the files. I can find 3 instances of "Motley" in both files using Adobe's reader - I'm not sure why you see differences. As explained already, our Editor performs a better layout analysis on this particular document. I will attach the result in the next post. Not sure what else ...
- Tue May 07, 2013 5:29 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Searching OCRed text in Adobe Reader
- Replies: 15
- Views: 8814
Re: Searching OCRed text in Adobe Reader
Hi MarkinAZ, I did not get your attachments; make sure to zip them and attach the zip file. Our forum software removes some file extensions automatically to mitigate spamming. The differences in search output is because words in a PDF document are not necessarily connected logically, the way they mi...
- Mon May 06, 2013 10:12 pm
- Forum: PDF-X OCR SDK
- Topic: Very large file size after the text recognition
- Replies: 23
- Views: 10617
Re: Very large file size after the text recognition
The option to output to an existing PDF is a feature of the Viewer but not directly available in the OCR SDK. The last release was primarily a bug fix. You can, however, access the text and position results and place them yourself in an existing PDF if you wish. Looking back over the thread I guess ...
- Mon May 06, 2013 9:37 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Searching OCRed text in Adobe Reader
- Replies: 15
- Views: 8814
Re: Searching OCRed text in Adobe Reader
Checking again, it looks like the document has two layers of invisible OCR text in it (one on top of the other). Maybe it had an extra OCR layer added by your scanning software, or perhaps it was OCR'd twice by using the "Preserve existing content" option in the viewer. This is probably wh...
- Mon May 06, 2013 6:15 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Searching OCRed text in Adobe Reader
- Replies: 15
- Views: 8814
Re: Searching OCRed text in Adobe Reader
Hi, Thanks very much for your samples. We appreciate this kind of feedback as it really helps us improve our products. I have looked into this and have found the following: First, I was not able to completely reproduce your problem - using the document you provided, in Adobe Reader XI (11.0.2) I was...
- Mon May 06, 2013 4:33 pm
- Forum: OCR- For the PDF-XChange Editor and Viewer
- Topic: Question about Bounding-Box Zoom
- Replies: 1
- Views: 2794
Re: Question about Bounding-Box Zoom
It cannot be done with the Active-X SDK. The editor SDK will allow this kind of function, but that is several months away right now. You could use the Tools SDK to do this, but you will have to write your own page content parser and do all the logic to do "hit tests" to check for mouse cli...
- Fri May 03, 2013 11:23 pm
- Forum: PDF-X OCR SDK
- Topic: Different outputs OCRp_PageText and OCRp_GetSymbolFromRegion
- Replies: 14
- Views: 7246
Re: Different outputs OCRp_PageText and OCRp_GetSymbolFromRe
I have checked with my version of your sample code here and indeed characters do match up between that from OCRp_PageText() and that taken from OCRp_GetSymbolFromRegion(). I suspect, as stated, that this all relates to how you handle the unicode strings and characters returned by these functions. Se...
- Fri May 03, 2013 11:02 pm
- Forum: PDF-X OCR SDK
- Topic: Different outputs OCRp_PageText and OCRp_GetSymbolFromRegion
- Replies: 14
- Views: 7246
Re: Different outputs OCRp_PageText and OCRp_GetSymbolFromRe
Just a note: a good way to workaround this problem, if you don't want to tweak your code to deal with unicode or UTF8 handling, would be to apply specific whitelists that only contain ANSI / ASCII characters (e.g. "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~!@#$%^&*()&qu...
- Fri May 03, 2013 10:44 pm
- Forum: PDF-X OCR SDK
- Topic: Different outputs OCRp_PageText and OCRp_GetSymbolFromRegion
- Replies: 14
- Views: 7246
Re: Different outputs OCRp_PageText and OCRp_GetSymbolFromRe
My suspicion is that this involves encoding of the text (unicode, or UTF-8, ANSI, etc). The text you receive from those functions is unicode text, and you must ensure to use unicode functions or do the correct conversion (e.g. to UTF-8). If you are outputting with ANSI text functions these character...
- Fri May 03, 2013 10:19 pm
- Forum: PDF-X OCR SDK
- Topic: Different outputs OCRp_PageText and OCRp_GetSymbolFromRegion
- Replies: 14
- Views: 7246
Re: Different outputs OCRp_PageText and OCRp_GetSymbolFromRe
Thanks, am looking at them now.
-Walter
-Walter
- Thu May 02, 2013 4:12 pm
- Forum: PDF-X OCR SDK
- Topic: Different outputs OCRp_PageText and OCRp_GetSymbolFromRegion
- Replies: 14
- Views: 7246
Re: Different outputs OCRp_PageText and OCRp_GetSymbolFromRe
Does OCRp_GetSymbolFromRegion() work in other cases for you, but not this one?
Can you provide a piece of sample code to reproduce this issue, as well as the input PDF?
Can you provide a piece of sample code to reproduce this issue, as well as the input PDF?