PDF XChange Forum

Walter-Tracker Supp

Arabic uses a special OCR engine mode, and a bunch of additional support language files. This functionality has not been implemented in the viewer. Sorry.

Walter-Tracker Supp

Do you run OCR with "Document->OCR Pages..." command, or from New Document -> From Images (or From Scanner) with the OCR post-processing enabled?

Walter-Tracker Supp

As I mentioned, we'll be providing better compression in the next major release. There are some constraints on the compression methods we use in the current release that can result in larger files in some cases. Meanwhile I am investigating the file provided most recently in this thread - there may ...

Walter-Tracker Supp

Unfortunately there doesn't appear to be a simple way to duplicate printers in Windows 8 that I can find, and when I installed a second printer copy manually, Windows 8 "merged" it so that I only had one copy. It appears Windows 8 is trying to protect against "clutter" and I'm no...

Walter-Tracker Supp

Thanks for your patience (and for the reminders!) -Walter Hi lmacri, Yes, it's almost two months since Viewer build 2.5.213.0 was issued, almost six weeks since 2.5.213.1, four weeks since I first asked about updating the Viewer's Version History page, and two weeks since I asked again, bus as today...

Walter-Tracker Supp

Yes, we use the __stdcall calling convention. You do not need to purchase the product to try it; there are some limitations (e.g. watermarks if you create documents, limits on the number of pages you can OCR, etc) but you can try every feature out without purchasing a license.

-Walter

Walter-Tracker Supp

You can do this with the Pro Tools SDK, but it is not active-X but rather native C++ DLL with a flat C-style API. We have functions to extract existing text, and an OCR component that lets you perform OCR and create either a searchable PDF output, or extract text which you can save to a text file if...

Walter-Tracker Supp

I've looked at your document, and while I don't see nearly the poor performance you do, I do note that it takes longer than typical files. You will notice that pages 8 and 9 are the culprits, and this is because the layout of those pages are difficult for our engine to process, due to the complexity...

Walter-Tracker Supp

Croatian will be available on or before the next build, anticipated in about a month's time. Meanwhile you can use any other language we provide which uses the same diacritics, if applicable (I'm not familiar with Croatian myself), because the word dictionary coupling is weak. I will update this for...

Walter-Tracker Supp

You are right, it copies at the correct DPI, but does not set the information in such a way that the clipboard retains the DPI setting.

It has been fixed in development as of right now and will be in the next release.

Walter-Tracker Supp

Not at the moment. We may release a tool to help with training in the future. However, if you feel ambitious you can email us at support@pdf-xchange.com and I can point you in the right direction, but can't provide detailed support for it - you'd be on your own.

Walter-Tracker Supp

Ludwig, I have attached the language pack to this post, because I guess it will still be a few days since our installer people are very busy with the new editor release. You will have to place them in your language directory yourself, and we cannot provide support for this since we will have a prope...

Walter-Tracker Supp

Ludwig, I have prepared the Fraktur language pack and sent it to our installation guys. It may be a few days before it becomes available on the website but I thought I would update you to let you know that it will be very soon. It will work with both the viewer and the editor.

-Walter

Walter-Tracker Supp

We will add Slovakian, Swedish, and German "fraktur" language data in the final release of the editor. We will not have direct Latin support, though results using English (or even other Latin alphabet) language selection will be fairly good since the word dictionary weighting is fairly wea...

Walter-Tracker Supp

We are providing this kind of functionality in the new SDK which will be out after the editor is finalized.

-Walter

Walter-Tracker Supp

I'm afraid I am unable to reproduce this problem with either the 32bit and 64bit DLLs (directly from the link provided by Stefan to the current live DLLs). Your key will still be valid with them, so I'm not sure what's going on. Have you changed any of your code? Can you try the sample applications?...

Walter-Tracker Supp

Oh, if you mean the Active-X Viewer component, it contains the latest OCR available for that range of products (Viewer and PDF-Tools SDK).

We have an improved underlying engine in the *Editor* product and upcoming related lines.

Walter-Tracker Supp

You can check the version of the dll within Windows' file explorer. Just right-click the dll and select "properties" from the context menu, then select the "Details" tab. Version is indicated in the "File version" field. And yes, the SDK for download is always up to dat...

Walter-Tracker Supp

There is some limited support for recognition of mixed Chinese (traditional or simplified) with latin script. I would recommend you try the free PDF-XChange Viewer (from our downloads page) with the Chinese language package and try it out on some sample documents, as it uses the same underlying OCR ...

Walter-Tracker Supp

You say it crashes unless you add the a sleep statement; what is the nature of the crash?

What is your OpenPdfFile() function?

Walter-Tracker Supp

OCR text is essentially the same as visible text, except that it is not rendered. You can extract text by selecting it with the mouse, and copying / pasting, or you can use the Viewer's javascript provisions. I have attached a simple script that extracts text from the current page and outputs it to ...

Walter-Tracker Supp

Thank you - it will be fixed shortly.

-Walter

Walter-Tracker Supp

There is a named command called "ToggleBookmarksPane" that will do what you want to do. You run commands by invoking the method DoVerb() of the ActiveX object. DoVerb() does a few things (see "Named Operations" in the manual), but for your purpose you will want to use it to invok...

Walter-Tracker Supp

The OCR SDK does not support this; OCR results persist after an OCR job is performed and until you free the OCR document object with OCR_Delete(), but they cannot be directly recovered (e.g. into a PXO_Page object) from an already OCR'd document on disk. You can extract text from documents using the...

Walter-Tracker Supp

This comes from a bug in the OCR engine which was throwing an unhandled exception. For the moment the best solution is for us to handle the exception internally and return an error code so you can handle the failure gracefully. I will provide a new build for download on the website shortly (version ...

Walter-Tracker Supp

Thanks, will investigate.

Walter-Tracker Supp

Can you give some details about the nature of the crash?

What DPI were you setting for OCR? What error or exception was returned?

-Walter

Walter-Tracker Supp

Many thanks

Walter-Tracker Supp

Also, the results of OCR_MakeSearchable() remain valid until the document is freed with OCR_Delete(). You can work with multiple documents by creating multiple input documents with OCR_Init() and OCR_Load()/OCR_LoadW(), e.g.: In pseudocode: // ocr first document PXODocument doc1; OCR_Init(..., doc1,...

Walter-Tracker Supp

Yes, you can re-use results; the function OCRp_Page() will return a pointer to page information that remains valid until explicitly freed with OCRp_FreePage(). So, in psuedocode: PXO_Page pages[40]; for (nPage in range(40)) OCRp_Page(doc, nPage, options, &pages[nPage], &settings); DoStuff(pa...

Walter-Tracker Supp

The encoded text is in UTF-8 which is a variable width encoding (1 byte "backwards compatibility" mode for ASCII chars, 2 bytes for non-ASCII unicode). I'd make sure you're working with UTF-8 during your conversions.

Walter-Tracker Supp

BobM wrote:Walter - thanks for the feedback and clarification.

No problem!

Please note that the feedback I gave was more general; the problems you saw are directly related to the bug which has been resolved now (will be in the release available most likely by the end of today).

Walter-Tracker Supp

The bug has been resolved and the fix be present in the next release version of the editor and OCR Plugin (next week, I believe, although I'm not responsible for the release schedule so I may defer to someone else to weigh in on that one). However I would like to point out that your document "s...

Walter-Tracker Supp

Thanks; this appears to be a bug in the handling of certain types of page layout, not directly related to OCR but definitely having a big impact as it results in incorrect page orientations being passed to the OCR routines. It will most likely be addressed in the next build (probably a week or so aw...

Walter-Tracker Supp

When is version 3 going to be released? Will there ever be a python API for PDF exchange? I want to punch myself in the face when I read java. def prioritize(featureitem): if featureitem=="pythonAPI": return sys.maxint else: return normalrank(featureitem) def development_meeting(): ranks ...

Walter-Tracker Supp

If you could provide an example document for us to examine we would appreciate it. In our testing, the OCR in the editor has much better layout analysis and generally gives higher quality results than the viewer. In particular, differentiation of text and image regions is much better and the overall...

Walter-Tracker Supp

Hi, Please try the latest DLL first - the current version is 1.0.13. There is a table of error codes in the SDK manual which I do not have at my disposal right now (I'm on a mobile device, away from work for the time being) - maybe this will point you in the right direction? You might also try our D...

Walter-Tracker Supp

Walter, Yes, very nice! There were just the 4 instances of "motley" detected as desired. Searches for other words also performed as expected. Also, no extraneous characters appeared in the searches - one of my original reasons for posting on this forum. Bottom line - time to get the PDF-X...

Walter-Tracker Supp

Here's the output using our new OCR layout engine from the Editor. Can you confirm that you can also find all 4 instances of "Motley" using your reader(s) of choice? If you select text of this one, and the OCR output from the Viewer (provided by you already), you should be able to clearly ...

Walter-Tracker Supp

Thanks for the files. I can find 3 instances of "Motley" in both files using Adobe's reader - I'm not sure why you see differences. As explained already, our Editor performs a better layout analysis on this particular document. I will attach the result in the next post. Not sure what else ...

Walter-Tracker Supp

Hi MarkinAZ, I did not get your attachments; make sure to zip them and attach the zip file. Our forum software removes some file extensions automatically to mitigate spamming. The differences in search output is because words in a PDF document are not necessarily connected logically, the way they mi...

Walter-Tracker Supp

The option to output to an existing PDF is a feature of the Viewer but not directly available in the OCR SDK. The last release was primarily a bug fix. You can, however, access the text and position results and place them yourself in an existing PDF if you wish. Looking back over the thread I guess ...

Walter-Tracker Supp

Checking again, it looks like the document has two layers of invisible OCR text in it (one on top of the other). Maybe it had an extra OCR layer added by your scanning software, or perhaps it was OCR'd twice by using the "Preserve existing content" option in the viewer. This is probably wh...

Walter-Tracker Supp

Hi, Thanks very much for your samples. We appreciate this kind of feedback as it really helps us improve our products. I have looked into this and have found the following: First, I was not able to completely reproduce your problem - using the document you provided, in Adobe Reader XI (11.0.2) I was...

Walter-Tracker Supp

It cannot be done with the Active-X SDK. The editor SDK will allow this kind of function, but that is several months away right now. You could use the Tools SDK to do this, but you will have to write your own page content parser and do all the logic to do "hit tests" to check for mouse cli...

Walter-Tracker Supp

I have checked with my version of your sample code here and indeed characters do match up between that from OCRp_PageText() and that taken from OCRp_GetSymbolFromRegion(). I suspect, as stated, that this all relates to how you handle the unicode strings and characters returned by these functions. Se...

Walter-Tracker Supp

Just a note: a good way to workaround this problem, if you don't want to tweak your code to deal with unicode or UTF8 handling, would be to apply specific whitelists that only contain ANSI / ASCII characters (e.g. "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~!@#$%^&*()&qu...

Walter-Tracker Supp

My suspicion is that this involves encoding of the text (unicode, or UTF-8, ANSI, etc). The text you receive from those functions is unicode text, and you must ensure to use unicode functions or do the correct conversion (e.g. to UTF-8). If you are outputting with ANSI text functions these character...

Walter-Tracker Supp

Thanks, am looking at them now.

-Walter

Walter-Tracker Supp

Does OCRp_GetSymbolFromRegion() work in other cases for you, but not this one?

Can you provide a piece of sample code to reproduce this issue, as well as the input PDF?

Search found 381 matches