PDF-XChange - Tracker PDF Viewer - TIFF-XChange - Image-XChange - XMF-XChange - Raster-XChange - Support

Moderators: Tracker Support, TrackerSupp-Daniel, Chris - Tracker Supp, Vasyl-Tracker Dev Team, Sean - Tracker, Tracker Supp-Stefan

 
Archie
User
Topic Author
Posts: 4
Joined: Tue Dec 03, 2013 9:48 pm

SDK to extract text, then search

Tue Dec 03, 2013 9:59 pm

We write software for the Logistics industry.
We have an "Edoc" system that keeps shipping documents as PDFs.
Our customers are asking for the ability to search the recently received PDFs (say 3,000) for things like a company name.
I am thinking about using the PDF-XChange Viewer SDK to do that automatically in the background with no user input as the "Edoc" pdf is created.
I expect to create, for each Edoc PDF, a pdf.txt file containing the extracted text from the .pdf

After that I need to write a program that will do the search of the pdf.txt files looking for the desired string, eg company name.

Question:
Does anyone know of an ActiveX module that I can use to do the search with as little user input as possible, just the string to search for?

Thanks
Archie
 
Paul - Tracker Supp
User
Posts: 4713
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: SDK to extract text, then search

Tue Dec 03, 2013 10:13 pm

Hi Archie,

thanks for the post and welcome to the Tracker Forums.

I think you should look at the PDF-Tools SDK. http://tracker-software.com/product/pdf-tools-sdk Assuming your PDFs are being created as text based PDFs and not image based then you should be able to search the strings directly on the PDF without the txt file in between.

If it's image based then you'd need to OCR the PDF first then search the text.

All the SDKs are fully functional, even in 'Trial Mode' and you can test every aspect of your program before committing to a purchase. The caveat is that until licensed anything you do with the SDK will result in water marked PDFs. Once you are happy that you have the right solution simply purchase a license, inject the serial keys and dev code we give you into your source code, recompile and go...

I hope that helps. Do be sure to let us know if you have further questions.

regards
_________________
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
 
Archie
User
Topic Author
Posts: 4
Joined: Tue Dec 03, 2013 9:48 pm

Re: SDK to extract text, then search

Tue Dec 03, 2013 10:33 pm

Hi Paul

Thanks for the quick reply.
Some PDFs come from forms that have text but lots come from faxes attached to emails. They are image based.
For those I will need the OCR stuff.

I presume your OCR stuff will allow me to create a pdf.txt file with the OCR produced text and that you have ActiveX modules that my programs can call to do it.

The question for which I am looking for some guidance is related to the next step where I build an application program that asks the user for a search string and it goes off and searches all the pdf.txt files for the desired string.

Question:
Does anyone know of an ActiveX module that I can use to do the search with as little user input as possible, just the string to search for?

Thanks
 
John - Tracker Supp
Site Admin
Posts: 8192
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Re: SDK to extract text, then search

Wed Dec 04, 2013 4:52 pm

Hi Archie,

Topic moved to the correct forum ...

Walter (our OCR specialist) will reply shortly with regards your question - but in regards licensing - you will need a PDF-XChange PRO SDK (Not PDF-Tools SDK as advised by Paul) to gain access to the Live OCR SDK functions.

HTH
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
 
Walter-Tracker Supp
User
Posts: 383
Joined: Mon Jun 13, 2011 5:10 pm

Re: SDK to extract text, then search

Wed Dec 04, 2013 5:21 pm

You can do this with the Pro Tools SDK, but it is not active-X but rather native C++ DLL with a flat C-style API. We have functions to extract existing text, and an OCR component that lets you perform OCR and create either a searchable PDF output, or extract text which you can save to a text file if you wish.

We have wrappers for .NET and a few other languages, so you aren't restricted to C++, but it is not an Active-X component.

We do have an Active-X viewer component but this is typically used for providing customized viewing (and annotating, etc) capabilities in the scope of a custom application. You can't use it to automate text extraction.

-Walter
 
Archie
User
Topic Author
Posts: 4
Joined: Tue Dec 03, 2013 9:48 pm

Re: SDK to extract text, then search

Wed Dec 04, 2013 10:30 pm

We develop in a language called VisualDataflex which easily supports using ActiveX controls.

I went to the VDF forum and asked if your stuff would be usable.
The best reply I got was the following:
ask them how their exported functions are declared. If they use __STDCALL then you're all good. Each of their functions becomes an External_function statement in VDF

My question, then, is the above. Are the exported functions declared using __STDCALL?

Thanks
 
Walter-Tracker Supp
User
Posts: 383
Joined: Mon Jun 13, 2011 5:10 pm

Re: SDK to extract text, then search

Wed Dec 04, 2013 11:17 pm

Yes, we use the __stdcall calling convention. You do not need to purchase the product to try it; there are some limitations (e.g. watermarks if you create documents, limits on the number of pages you can OCR, etc) but you can try every feature out without purchasing a license.

-Walter
 
Archie
User
Topic Author
Posts: 4
Joined: Tue Dec 03, 2013 9:48 pm

Re: SDK to extract text, then search

Wed Dec 04, 2013 11:22 pm

Good news Walter.
Thanks for the info and the quick response.
 
John - Tracker Supp
Site Admin
Posts: 8192
Joined: Tue Jun 29, 2004 10:34 am
Location: Vancouver Island - Canada
Contact:

Re: SDK to extract text, then search

Thu Dec 05, 2013 3:23 am

Thanks Archie - do come back if you need any further info.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
 
Peter2
User
Posts: 756
Joined: Mon Sep 13, 2010 10:09 am
Location: Switzerland

Re: SDK to extract text, then search

Tue Sep 15, 2015 10:13 am

This posting is nearly 2 years old, but I'm looking for a similar solution. So my question is:

Has something important changed since 2013? New features, new tools out-of-the-box, new SDKs??

Peter
Win 7 Prof German; PDF-X-Change Pro German
 
User avatar
Will - Tracker Supp
Site Admin
Posts: 5881
Joined: Mon Oct 15, 2012 9:21 pm
Location: Chemainus, BC
Contact:

Re: SDK to extract text, then search

Tue Sep 15, 2015 2:42 pm

Hi Peter,

Lots has changed since 2013, but nothing too dramatic in terms of the OCR's overall functionality. What, specifically, are you looking for?

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support
http://www.tracker-software.com
 
Peter2
User
Posts: 756
Joined: Mon Sep 13, 2010 10:09 am
Location: Switzerland

Re: SDK to extract text, then search

Tue Sep 15, 2015 3:05 pm

Hi Will

this is the main-thread with the side-discussion "Where is menu "text Props"?"
viewtopic.php?f=62&t=24215

The (half baked) idea is:
The reason why I'm asking is that scanned drawings need
- OCR
- finding the position (coordinates) of the new strings
- "transform" (in a way which needs to be found ...) the content and the position of the strings to a vector-drawing.

This is why I'm thinking about "which text is where"?
Win 7 Prof German; PDF-X-Change Pro German
 
Ivan - Tracker Software
Site Admin
Posts: 3488
Joined: Thu Jul 08, 2004 10:36 pm
Location: Vancouver Island - Canada
Contact:

Re: SDK to extract text, then search

Tue Sep 15, 2015 10:27 pm

You can use Editor SDK or Core API SDK to retrieve a text from page and search in it.

Please take a look at IPXC_PageText interface http://sdkhelp.tracker-software.com/view/PXV:IPXC_PageText
Tracker Software (Project Director)

When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.

Who is online

Users browsing this forum: No registered users and 1 guest