Feature request: Table extraction

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Paul - Tracker Supp, Chris - Tracker Supp, Tracker Supp-Stefan, Ivan - Tracker Software

Post Reply
christian kuhlmann
User
Posts: 30
Joined: Wed Sep 18, 2013 10:33 am

Feature request: Table extraction

Post by christian kuhlmann » Sat Jun 23, 2018 3:56 pm

Hi,
working with PDFs containing engineering or business documents often entails working with tables of data. Since PDF does not have a native table representation, the process of trying to extract information from these tables is very tedious.
I am aware of the export to excel feature, but I find the results of it often have no resemblance to the original table, especially if this table contained invisible column / row grid lines or a mixture of visible and invisible grid lines.

To improve the export quality, a "table annotation" tool could be implemented. The user would select the number of rows and columns in the properties of the tool and draw a rectangle over the area of the table that should be extracted.
The tool then creates an overlay of a table with the chosen row/column counts and the user can drag the grid lines to coincide with the table in the document.
Using the export to excel feature now uses the table annotations in the document to divide the text into rows and columns. The results will still be much easier to obtain than manual copy and paste of each cell.
Multi-column or multi-row cells are not essential for the tool as this task can later be performed in Excel. The opposite - splitting cells that are merged by the current algorithm - is much harder, which is why the export tool currently is not very helpful. Saving the table annotations to the document would not be required, but might be helpful if technically possible.

Thanks for a great pdf editor.
Chris

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 14100
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Feature request: Table extraction

Post by Tracker Supp-Stefan » Mon Jun 25, 2018 1:46 pm

Hello Chris,

Many thanks for your post and suggestion.
I will pass it to the person working on the Export to Excel feature.

For the moment you can "force" the table recognition by drawing the needed lines yourself (I am using rectangles that I then flatten).
So if your table is not recognized correctly - add visible separators and then flatten them - and the conversion should get much better!

Regards,
Stefan

Post Reply