Provide a tool to extract to RTF or other structured format

Please post any requests or ideas you may have for new features for the end User Version of PDF-Tools here.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, John - Tracker Supp, Ivan - Tracker Software, Tracker Supp-Stefan, Tracker - Clarion Support, moderators, Support Staff

Post Reply
adrianus
User
Posts: 5
Joined: Sat Sep 25, 2004 2:11 am

Provide a tool to extract to RTF or other structured format

Post by adrianus »

It would be very useful to have a tool to extract from PDF to a formatted document eg. RTF or the format used by Open Office or to HTML.

Extraction needs to retain the paragraph structure properly (e.g. current text extraction seems to put a CR at the end of each line, which WORD interprets as a new Para ...)

Would also be useful to have an option in text extraction to only put CR at end of para, so that WORD or other editing tools would be able to recognise paragraphs ...
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Post by John - Tracker Supp »

Hi Adrianus,

thanks for the suggestions and we can confirm that we are working towards this - we have a new parser in progress now that will extract such info and we hope tables etc too.

This may take a little time yet - but it is our objective to provide such functionality.

best
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
guyghk
User
Posts: 21
Joined: Thu Jan 06, 2005 9:00 am

Extracting tables to Excel my priority

Post by guyghk »

If you could come up with something that extracted to Excel, I think that would be very popular.

There are a few products that claim to do it - I've just tested three with a view to getting one. However they are either:

- very limited in functionality, e.g. verypdf.com, which just extracts to txt and is not much better at preserving text positioning than the existing "Extract text from pdf" within PDF Tools;

- useless at dealing with complicated tables, e.g. abbyy.com;

- pretty good (though not 100%) at extracting tables, but over-priced, e.g. investintech.com at $90.

If PDF Exchange & PDF Tools are anything to go by, I'd expect you to come up with a better product at a better price. I'd definitely pay for an upgrade to this.

Guy
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Post by John - Tracker Supp »

We are looking this spring at introducing a number of new end user PDF2.... extraction tools - thanks for your input - Word and Excel formats are top of the list.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
guyghk
User
Posts: 21
Joined: Thu Jan 06, 2005 9:00 am

Post by guyghk »

Hello,

Is there any update on the timing of this? Not meaning to push but is it still in the pipeline?

Thanks,

Guy
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Post by John - Tracker Supp »

Yes, but still some weeks away.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
adrianus
User
Posts: 5
Joined: Sat Sep 25, 2004 2:11 am

PDF to Word - further thoughts

Post by adrianus »

Based on experience with several PDF to Word tools, an observed gap in functionality is some smarts in dealing with paragraphs. All of the tools I've seen so far don't seem to be able to do anything clever about reconstructing paragraphs from the structure generated in the pdf file. Instead each line in the paragraph is converted to word as a separate paragraph.

Will you be trying to address this issue in your conversion tool eventually?
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Post by John - Tracker Supp »

Hi,

have you tried the PDF to RTF extraction function in PDF-Tools ?

https://www.pdf-xchange.com/home/pr ... /pdftools/

Does this do as required and if not - please do provide some sample PDF's and RTF files (zipped) and we would be pleased to look into any issues found.

thanks
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
adrianus
User
Posts: 5
Joined: Sat Sep 25, 2004 2:11 am

PDF to Word - further thoughts

Post by adrianus »

I'm currently using PDF-Xchange Pro - my driver is V3.6 Build 0102 (I couldn't find where the version of the tools was held). The PDF to Word/RTF tool in my version can't do the sort of thing I was proposing. I will try to generate an example in due course - may take a week or two, though due to a few time pressures!

Regards
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Post by John - Tracker Supp »

Ok thanks - Look forward to it.

In the meantime - you may want to take a look at PDFTransformer from ABBYY - it will almost certainly offer you what you need and uses an entirely different process to achieve the conversion - very impressive.

They also have a deal with us to bundle PDF-XChange in with this - hence the reason we have no objection in promoting another publishers products on this occassion.

http://www.pdftransformer.com

HTH
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
adrianus
User
Posts: 5
Joined: Sat Sep 25, 2004 2:11 am

Re: PDF to Word - further thoughts

Post by adrianus »

I have two sample coversions ready as a zip file but I couldn't submit it as an attachment - kept getting the message that the maximum size for all attachments was exceeded. The attachment itself is only 122Kb, so no idea what prevents submitting it. Perhaps you could give me an alternative way of submitting it to you ...

In one of the examples, just the first line of every paragraph ends up with a paragraph break in Word which doesn't really match how one would really like to see the conversion done ...

In the other example, every line ends up with a paragraph break.

As a human reader, it's easy to see what should really happen to get a perfect conversion, but no doubt the format of the pdf file makes it harder for the software to work this out ... Nevertheless, it would be nice if it had an option for being a bit more clever ... e.g. detecting that a set of single spaced lines followed by a wider spacing are really a paragraph, or alternatively that an indent line marks a paragraph start etc...

I'm currently using PDF-Xchange Pro - my PDF Tools are V3.6 Build 0102
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Post by John - Tracker Supp »

Hi,

Could you please email the files to usrfiles@tracker-software.com with a link in the body of the email back to this post and we will take a look.

Please zip the files sent.

Thanks

Also will increase your Forum personal attachments limit as I suspect overall you have exceeded your per user limit for all posts combined
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
adrianus
User
Posts: 5
Joined: Sat Sep 25, 2004 2:11 am

PDF Tools - further suggestions

Post by adrianus »

[quote="Tracker Support"]Hi,

Could you please email the files to usrfiles@tracker-software.com with a link in the body of the email back to this post and we will take a look.

Please zip the files sent.

Thanks

Also will increase your Forum personal attachments limit as I suspect overall you have exceeded your per user limit for all posts combined[/quote]

File emailed as requested. All attempts to submit were of a zip file ...

Regards
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Post by John - Tracker Supp »

Thanks Andy,

will analyse the files and advise when we have some progress.

Many thanks.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
Post Reply