Summarzing comments, extracting highlighted text

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

vee
User
Posts: 14
Joined: Fri Dec 05, 2014 12:59 pm

Re: Summarzing comments, extracting highlighted text

Post by vee »

Salut Francois

I think I get what you’re saying from a search point of view... but that wasn’t what I was referring to.

Put search thinking to one side for a moment and consider the initial work-flow of assigning a tag. At that stage we don’t want a global list of tags to choose from, we want to make or choose subject-specific tags.

The work-flow is this: read, select text to highlight/annotate, assign a subject, assign one or more tags. We would do this in Editor via first the ‘subject’ field (with possibility of a drop-down list of existing subjects in that document) and I’m proposing an additional field for tags (with a drop-down list for subject-specific tags - ie the tags that appear in the list depend on the subject selected in the ‘subject’ field).
The reason is this: suppose one’s working on three different projects - let’s say animal emotions and evolution, Bitcoin and personality disorders. We open a document about crypto-currency and start highlighting... we assign our subjects and assign tags per subject. The last thing we want at this stage is Editor to search through our whole directory of pdf’s and offer up global lists, because we don’t want irrelevant subjects and tags to choose from for personality disorders or animal emotions.
At a later stage, however, it might well be that we start seeing connections with animal emotions and personality disorders and so then having access to global lists to search, for subjects and tags from all the pdf’s we’ve studied, makes a lot of sense because it will help us extract patterns and connections as well as go directly to the information, via the search function you describe... but we don’t need that at the initial per-document reading and subject/tag assignment stage.
I think the global aspect you describe is very useful but there also has to be an initial document specific use of tags and that suggests having subject-linked tagging capability.

By the way, ASuLiB may well seem irrelevant, as you suggest, if all is implemented but actually I don’t think it would be. Research is about collecting information but the really interesting stuff starts when making connections, seeing patterns and forming new ideas. Having a fast and simple batch process to create a report of all subjects and tags - which are basically thoughts - is still highly useful for tracking the mental picture and information patterns. Not to mention the ability to know which document to go to to find the ideas one’s interested in. Don’t underestimate the usefulness of ASuLiB, it's a great idea!!
francois maurice
User
Posts: 100
Joined: Sat Sep 29, 2012 5:38 am

Re: Summarzing comments, extracting highlighted text

Post by francois maurice »

Thank you Vee for clarification.

In my last Post I mixed two topics. In fact, I understood that Tracker did not implement a Tag Pane at first (maybe never). It simply adds an additional way to group annotations via the 'Comments Pane' and the 'Summarize Comments' feature.

I must admit that I still do not understand the distinction you make between topics and tags. I believe that tags were just a collection of topics that software generates automatically. In fact, that's why I used the terms subject, keyword and tag as synonyms. From my point of view, and if I forget the search aspect, we write subjects (keywords, tags) in the Subject Property Field, then we can group or search these subjects (keywords, tags). From your point of view, how will we use the Tag property field that you propose? Can we assign different subjects to the same tag?

But your idea of a drop-down list is very important to facilitate the choice of subjects and tags. And this gives me an idea for the Tag Pane (again!). Near the top of the Tag Pane, there may be an option to choose a subset of subjects to display in the drop-down list: display subjects in the active document; display the subjects of all open documents; display subjects from a folder based on another topic (for example, the name of a project); display all the subjects in a folder.

And I promise that I will not denigrate ASuLiB anymore. :wink:

François

François
vee
User
Posts: 14
Joined: Fri Dec 05, 2014 12:59 pm

Re: Summarzing comments, extracting highlighted text

Post by vee »

Hi Francois

I echo Paul above - what a very interesting discussion!

Yes, when talking of Xchange Editor, as it is now, then I agree with you, we are only talking about what we put in the subject field. In that context we can call what we put there keyword/subject/tag... it’s the same thing (as you say). ASuLiB then does something clever with this but it’s still based on keyword/subject/tag being synonymous, because it’s a single field.

However, if we leave Editor and AsuLiB where they are now and go forwards in time to where someone has implemented what you called (in your ASuLiB introduction thread) “an advanced annotation management system”, then we could have extra fields and would not, I suggest, want to use keyword/subject/tag synonymously because it could be more like topic, subject, tag and they could each add different layers of functionality that could be used to fine-tune selection and/or display.

Have a look at Citavi for example - it uses ‘category’ and ‘keyword’. This is infact a further layer because Categories are a tree-structure whereas Keywords just a list. Perhaps we could say it reflects topic-subject-tag.

The distinction is clearer if you check out MyBase which uses just ‘Labels’ (what we’re calling tags or keywords) but instead of listing them, it provides a tree-structure - which is what I’d call subject and tag.

Actually, although you say you don’t understand the distinction between topics and tags, I think we’re on the same page because what you describe in your latest idea would take that concept to a whole new level. Your idea (display subjects in the active document; display the subjects of all open documents; display subjects from a folder based on another topic (for example, the name of a project); display all the subjects in a folder) is that extra layer...plus another on top of that!

It would make a pdf editor behave like a far more complicated reference manager without all the file importing, cross connecting and bloat. We’d have a pdf Editor with an efficient annotation workflow plus great ways to categorise, view, summarize and use those annotations in either a micro or macro kind of way... really useful when working with multiple pdf’s and multiple projects. The more I think on it, it’s another excellent idea. Good stuff! Seems like we’re hitting the sweet-spot of what “an advanced annotation management system” should be. I do hope the Tracker team pick up on this!
francois maurice
User
Posts: 100
Joined: Sat Sep 29, 2012 5:38 am

Re: Summarzing comments, extracting highlighted text

Post by francois maurice »

Hi vee,

OK, I now understand why you make a distinction between tags and subjects (and keywords, categories, etc.). WordPress has a good generic name for all those different ways to classify an item (for us, comments/annotations): taxonomy. For Citavi, there would be two taxonomies: category and keyword. By default, WordPress comes also with two taxonomies: category and tag. Like Citavi, categories are hierarchical whereas tags are lists. Peter28 and standish-001 used the image of stacks or piles of cards in this thread. We just need this: “an advanced annotation management system” (to cite myself) to organize PDF annotations in different piles. To cite myself again, I also wrote in ASuLiB introduction thread :"The management features of comments have hardly changed since the late 1990s." It says everything.

When I think about an advanced annotation management system, I have always in mind that the system must be able to handle thousands of PDF files and hundreds of thousands of annotations. I'm only at the beginning of my career as a researcher and I already have about 1,500 PDF files and a few thousand annotations. I imagine that in a few years I could have 5,000 or 10,000 PDF files and tens of thousands of annotations. If we think of a very large research project conducted by a team of researchers, the number of PDF files can easily reach several thousand and the number of annotation can reach the tens of thousands.

As you said: "I do hope the Tracker team pick up on this!"

François
RoGraf
User
Posts: 13
Joined: Tue Mar 20, 2018 10:39 am

Re: Summarzing comments, extracting highlighted text

Post by RoGraf »

Hi again,

how is the feature request "RT#4304: FR: Summarize comments' content adjustment" doing?
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8436
Joined: Wed Jan 03, 2018 6:52 pm

Re: Summarzing comments, extracting highlighted text

Post by TrackerSupp-Daniel »

Hello RoGraf,

There is no new information since the last request, it is still in progress, but incomplete currently.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
standish-001
User
Posts: 76
Joined: Sun Apr 01, 2018 2:45 pm

Re: Summarzing comments, extracting highlighted text

Post by standish-001 »

Hi,

With PDF XChange Editor as the foundation + a few different programs I've been able to Frankenstein together my own type of tagging system.

One day soon I'll post a video as a sort of an illustration of how I'm achieving it.

At the same time, encourage the good Tracker folks, who have weighed in with such kind attention - really listening, which is EXTRA APPRECIATED - to consider the idea of offering tagging and extracting highlights as a little mini-add-on, for purchase. A stand alone purchase --> get the FREE reader when you buy a set of robust highlighting & commenting extraction tools.

Here is my thinking behind this. In my efforts to achieve something like we've all been discussing here vis-à-vis highlights with tags, I signed up and paid a US$5.00 annual subscription for: https://www.sumnotes.net/

Which is just a straight highlight cleaner-upper. That's all it does. Extracts PDF highlights with as little overhead as possible.

No tagging, to be sure. But, that such an app exists, and I paid for it, illustrates this might be a whole 'nother slice of the market for you. For students who have to read these rich classical texts, hundreds and hundreds of pages in length: don't ask them to pay for the WHOLE pdf software, ask instead for them to buy ONLY the highlighting + tagging tools, and even pay annually. (Sumnotes is annual.)

How does all that look?

Many thanks,

SW
francois maurice
User
Posts: 100
Joined: Sat Sep 29, 2012 5:38 am

Re: Summarzing comments, extracting highlighted text

Post by francois maurice »

Hi standish-00,

Thanks for your post.
I'm looking forward to your video.

As far as Editor's business model is concerned, I obviously can not speak for Tracker.

If you are looking for a basic tagging system, you can try my ASuLiB script. I think it works with the free version of Editor:

viewtopic.php?f=62&t=30582

Let's hope that one day we will be entitled to a real software for managing PDF comments.

François
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17818
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Summarzing comments, extracting highlighted text

Post by Tracker Supp-Stefan »

Thanks SW and François for your posts!

We would also be quite interested in the video!

Regards,
Stefan
User avatar
David.P
User
Posts: 1510
Joined: Thu Feb 28, 2008 8:16 pm

Re: Summarzing comments, extracting highlighted text

Post by David.P »

Pls. add me to the list of people interested in seeing this video.
David.P
PDF-XChange Pro
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17818
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Summarzing comments, extracting highlighted text

Post by Tracker Supp-Stefan »

:D
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6829
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: Summarzing comments, extracting highlighted text

Post by Paul - Tracker Supp »

Hi guys,

I just had a discussion with the development team about the state of this. I am having difficulty making the case that this "Advanced Annotations Management" is in sufficient demand to justify the work that would be involved.

To be accurate, adding the ability to introduce an arbitrary delimiter in the Subject field to act as "Tag" definitions is not such a problem, and it could be used to search in the current document, however, the ability to search other documents is key to this feature's power and that is where I am encountering resistance. It seems there is a lot more involved in that aspect of the request than I had anticipated.

As such I have raised the priority of ticket RT#4632: Feature Request :: Editor :: Tags in Annotation Subjects

In order to further this I have gone through this thread again, there have been a few links to resources I may be able to use to make the case that this is not just the pet project of myself and a few hard core Editor users, but I will need more. Most of what we spoke of here were examples of how one would like to see this done.

Anything you can find that supports this case would be helpful.

Additionally, it seems that the ability to choose whether to include the "Type", "Author", "Subject", and "Date" in the summary (#4304: FR: Summarize comments' content adjustment) would go a long way to making the results more usable so I reached out to the Development team regards this. The developer who has been assigned this task is currently working on improvements to our integration with Outlook. I have managed to get agreement that once this is complete he will look at this.

regards
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
francois maurice
User
Posts: 100
Joined: Sat Sep 29, 2012 5:38 am

Re: Summarzing comments, extracting highlighted text

Post by francois maurice »

Hi Paul,

Thank you for your efforts to convince the developer team of the usefulness of an advanced annotation system. It is highly appreciated. It's hard to convince someone of a potential market. So, I will focus first on the potential market, then on the resistance you encounter about "the ability to search other documents", and finally about another resistance you can encounter: ideally the PDF files has to be indexed for faster full text search.

This market for an Advanced Annotations Management system has only one contender right now. As I said in the following thread, viewtopic.php?f=62&t=30374, Qiqqa is the closest thing I saw which can be called a full fledge PDF annotator. It might be good to introduce Qiqqa to the developers. In itself, the tagging system is functional. So, why not use Qiqqa? Because Qiqqa has major flaws. This does not help Qiqqa to prevail in a market without competition.

The biggest flaw is that the main features are not PFD compliant. Annotations and tags (“subject” property) do not comply with ISO standards for PDF format. So, if Qiqqa goes bankrupt you are in big troubles. And even if Qiqqa does not go bankrupt, it's impossible to work on a device that does not have Qiqqa installed, which means you can not collaborate with colleagues who do not use Qiqqa. Another big mistake, Qiqqa try to be a big software which can manage all the workflow of a research project. Which is silly.

If I'm not mistaken, Tracker focus now on the business community. An advanced annotation management system is intended for the scientific community, although I believe it may be useful to the business community. I am not a marketing specialist, but I think it would be better to offer a second software: let's name it PDF Xchange Annotator. In this way, Tracker would have two flagships. With two software programs, Editor will not become too crowded. I'm not a programmer either, but I think both programs can share the same kernel. In fact, Annotator would be Editor minus some features that don't interest researchers, plus an advanced annotation management system (a plugin or not), which interests researchers. Researchers are not interested in manipulating PDF files. They want to read articles, annotate articles, organize these annotations, and then easily find these annotations. Or, it may be the same software. The 'Editor' version and the 'Annotator' version, each version with features disabled.

If I enter the details, the Tracker website can highlight helps and tools for each community. Tracker can also saturate the market of primary and secondary schools by offering Annotator free to schools at those levels. This way you create a solid base of users.

Now, you say that "the ability to search other documents is key to this feature's power and that is where I am encountering resistance". First, you're right, this feature is paramount. In fact, without this feature, there is no advanced annotation management system. Second, I understand that you are experiencing resistance because it may be the most difficult feature to implement for an advanced annotation management system.

Qiqqa chose the path of centralization. To use Qiqqa, the program makes a copy of the PDFfiles in a directory under the control of Qiqqa. I do not think this is the ideal solution. Users have PDF files everywhere on their PCs. They have already organized their files more or less. They use reference management software and file sharing software. We can not ask them to reorganize everything. Again, I'm not a programmer, but I think it's possible to implement a permanent scan. First of all, XChange Annotator is not interested by PDF files per se. The software is interested by annotations. Since each annotation has a unique identifier (something like d61c893a-582e-41fa-b919f9ce8c0ae5e5), it is pretty easy to build a database of those annotations and by the same way of all tags written in the object property field of these annotations. The database can also store the last address of each PDF files, but this is for convenience and faster retrieving of those files by Annotator. Thus, even if the PDF file changes location, Annotator can find the annotations by a scan of the hard disk. In this way, the user can do whatever he wants with his PDF files (even delete them, if the text of the annotations is saved in the database).

Now, what about the interface of Annotator ? Editor focus on PDF files. The main window serves to display one or more PDF files. The main window or pane for annotations has to come to the fore in Annotator. In other words, when you open Annotator, it is an annotations window or pane (not to be confused with the current Editor comments pane) that should appear in Annotator's main area. From there, there are several tools for working with annotations, including the basic or main tool consisting of a list of tags in a pane, the Tag Pane. In the Tags Pane, we can select multiple tags to pin down a subset of annotations. Each time we select a tag, those annotations with that tag are display or loaded in the window or pane annotation, something as the results in the Editor search pane but with the full annotation not just a line (in fact, the first tag load a bunch of annotations, and all subsequent tags make some annotations disappear since we pin down). Once found what we were looking for, if we need to, we can click on an annotation to display this annotation in the PDF file (what the Editor search pane is already doing), which opens in a regular window for viewing the entire PDF files. There are several other interesting details for the interface, but I think for the moment it's enough.

Finally, even if the next feature is not part of an annotation system, it is still the main external feature of any annotation system. Annotator could index PDF files on the hard drive to make it much faster to do full text search. There are several tools for extracting text from PDF files. Zotero, a reference management software, uses Xpdf (https://www.xpdfreader.com/).

I hope these few comments can help you convince developers of the potential of an annotation management system.

À bientôt,

François
francois maurice
User
Posts: 100
Joined: Sat Sep 29, 2012 5:38 am

Re: Summarzing comments, extracting highlighted text

Post by francois maurice »

Hi everyone,

In order to help Paul convince Tracker to implement an Advanced Annotation Management (AAM) system, I propose to structure the discussion around these three themes: marketing, annotating, indexing.

In order to focus well, I also propose to act as if the Advanced Annotation Management system was intended for the scientific community, the education sector, the community of journalists, etc. In fact, Annotator is for anyone who uses PDF files to do research, but who are currently working with a hybrid system of PDF files and paper documents.

Marketing

  • Above all, we need to get an idea of the approximate total number of potential users arouns the world
    • We would need a volunteer to find that number.
    • Here is a partial list of the user categories for which we need this number.
      • Students of higher education
      • Professors in higher education
      • University researchers
      • Public Sector Researchers
      • Private Sector Researchers
      • Journalists
      • etc.
  • It is important for Tracker to speak directly to those communities
    • A good idea can be to create a sub-site dedicated to those communites.
    • We can find demo videos
    • For example, videos that suggest an optimal way to use Annotator on one screen, on two screens and on three screens.
    • A sub-section could be found for primary and secondary school teachers.
    • The idea is to show students how to organize a small research project using PDFs and annotations.
  • Putting forward the idea that researchers will no longer have to work with a hybrid and monstrous system that requires manipulating both PDF files and paper. The motto is All PDF. The slogan is Finally, computer-aided research is here (CAR).
Annotating

  • Making sure the AAM has the power to handle thousands of PDFs and tens of thousands of annotations.
  • The interface must be designed to handle annotations.
    • There are two main changes
      1. The ribbon should highlight features needed to manage annotations. In other words, most features for manipulating PDF files must be hidden or disabled.
      2. The main innovation would be the development of a window or pane to display annotations
        • With Qiqqa, you have to produce a report to access the annotations of a set of PDF files.
        • The Annotation window would be the equivalent of this report, but it would be constantly available since it would be the main Annotator window.
        • Unlike Editor's Comments pane (which displays the highlighted text if this feature is enabled) and unlike the Editor's Search pane (which only displays one line of text), this window should display annotation images of highlighted text. This is necessary in order to see special characters, for example equations, without having to open the PDF file.
        • Of course, if you click on the image of an annotation, the file opens in a normal PDF display window at the location of this annotation. This is the main reason to work with at least two screens. A third screen can be useful to have the Search Pane always available. Annotation window, PDF window and search pane is all you need to do serious research wihtout having to print anything. In fact, a fourth screen can be useful to have a working document readily available (Word, Writer, LaTeX, etc). The idea of working with multiple screens is to simulate working on a table. The mind needs to have a lot of information in front of the eyes. If we work with a single screen, then we are forced to switch from one window to another, which affects the concentration.
        • The Annotation window displays all annotation images, one after the other, based on the tags selected in the Tag Pane.
        • Those selected annotations can be sort ou group by type, color, annotation's author, modification date, creation date, etc.
Indexing

  • Although Editor's search engine is very fast, he can not compete at the moment with indexing software such as DocFetcher or Recoll. The problem with these indexing software is that they do not open a PDF file at the location of a search result, which is the case with the Editor search pane... but the Editor Search pane is not fast enough for thousand of PDF files!!!
À bientôt,

François
francois maurice
User
Posts: 100
Joined: Sat Sep 29, 2012 5:38 am

Re: Summarzing comments, extracting highlighted text

Post by francois maurice »

Hi,

Other ideas.

Marketing

  • No new ideas

Annotating

  • The Annotation window has to display annotation images in "dynamic" sub-window, something similar to the Editor Search pane and Comments pane.
  • The idea is to display each sub-window with the same height, but be able to adjust the height of each sub-window to see a larger portion of an annotation image or all the annotation image.
  • Possibility to add tags directly from the annotation image
  • Possiblity to select annotation images to do batch job, such as adding a tag to selected annotation images.
  • For the Tag Pane, a sub-section to create projects. A project is only a name. Say I have an article named "Article 1". Under this project name, I can drag and drop tags from the main tag section.
Indexing

  • No new ideas
François
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6829
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: Summarzing comments, extracting highlighted text

Post by Paul - Tracker Supp »

Hi François thank yo so much for the thought you have put into this.

I like the way yo break this up into "Marketing", "Annotating", and "Indexing". This is helpful to identify where I/we can best focus attention.

Marketing.
You are right when you say that this proposed change is more than just a feature, it is essentially it's own product, and with the best will in the world, that sadly may be it's downfall. As you point out, there are a huge number of features already in the PDF-XChange Editor that are not really needed/applicable to your use case. (research in general, I wouldn't want to limit this to any specific area of research, I can for example see an AAM being useful/popular in the legal world). As has been reiterated before, we must strike a balance between the needs of the many and the needs of the few, feature bloat is always a concern, as is a potentially confusing product line up. We are not going to make much headway on this if it is presented as it's own product. That much is certain.

IF this is to gain any traction then the best way I see it happening is as a plugin.

Advantages of a plugin:
  • avoids "feature bloat" by limiting these features to only those who pay for and install the plugin.
  • When installed, a plugin could include an option to load a "Research Optimized UI" that users could modify in the same way we do now for all the Toolbars. We wouldn't need to change the base product for that and users could customize and save their own tweaked version of said "Research Optimized UI"
  • Keeps Tracker's product line up unchanged.
I like the suggestion to get some numbers on "user categories". It is a start, however, even if we determine that the market according to these numbers is in the millions world wide, we would need to see a high proportion of these buy in order to recoup the development costs. We estimate that we have more than 400 million users world wide, from a possible (It depends whose figures you believe) 2 billion PCs - that's only 1 in 500. This is an area where we need to do some research. (Pun intended)

Annotating
Here I defer to your experience. I am so far from being a professional researcher myself that it's not funny. There is lots in here for consideration and time to get into these details if this ever gets beyond the discussion stage.

Indexing
It may be possible to hook into our shell extensions iFilter for this. It can search text based content of PDFs. How do you find this works for you with thousands of PDFs? Is it comparable to DocFetcher or Recoil?

Again, there is no promise to deliver anything at this point, and to be honest, the more this discussion progresses the more work it looks like it will be, and the market not as large as I had hoped. To that end I will do a little personal research into possible numbers in those user categories you came up with.

If, at the end the day, Tracker still determines that it is not a cost effective project, there may still be an option to build such a plugin without using Tracker's development resources, but that is a discussion best left for later.

Lets see what we can find out in terms of potential users/sales assuming your annotating and indexing proposals were implemented.

regards

Paul
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
francois maurice
User
Posts: 100
Joined: Sat Sep 29, 2012 5:38 am

Re: Summarzing comments, extracting highlighted text

Post by francois maurice »

Hi Paul,

I agree with you. Let's focus our efforts on a plugin with a small set of useful features.

What is the general aim of the Tag plugin? In my opinion, it is necessary that the Tag plugin have the necessary features to stop using paper in general when doing research with PDF files, to avoid what I call the hybrid system of PDF files and paper documents. This is the bare minimum, otherwise it defeat the purpose.

Annotating
With this goal in mind, I propose the following idea to implement the advanced annotation management system (AMM), an idea that I have already outlined briefly in a previous post.

We can see the Tag Pane as a specialized Search Pane. Behind the scene, the Tag Pane can use the same search engine as the Search Pane.

Currently, the Search Pane offers the opportunity to search in different places: Page Text (full text search), Bookmarks, Comments, etc.

The Tag Pane would not offer these choices. By default and silently, he would look for tags in the subject field property when clicking on a tag in the Tag Pane. The search result, the annotations, can be display as in the Search Pane (when selecting Comments as a place to search).

This would be the basic implementation: Tags are displayed in the Tag Pane and annotations are displayed at the bottom of the Tag Pane according to the selected Tags. Is this an idea that seems realistic from a developer point of view?

Here is a partial list of possible essential features that could be found in the Tag Pane or the Tag Ribbon:

  • Of course, we need a way to tag an annotation. After highlighting text, it would be convenient to be able to tag the annotation from the Tag Pane, without having to go through the Properties Pane of an annotation.
  • As mentioned in a previous post by vee, the field for writing tags should be a drop-down list that displays the tags already in use.
  • Renaming a tag (which updates, of course, all annotations with this tag). This can be find in a contextual menu by right clicking on a tag in the Tag Pane.
  • Deleting a tag (with a warning !!)
  • Possibility to select multiple tags to drill down to the a subset of annotations
  • Those tags not present in this subset can be greyed out
  • etc.
But all this is useless if the AMM does not work on the entire hard drive with tag indexing.

That said, we have to make a difference between indexing for full text search and tag indexing. Those are two different things.

Even if Tracker does not want to incorporate full-text indexing into the Search Pane, it is important that the AMM has an index for tags to quickly locate PDFs and annotations on the entire hard drive.


Indexing

iFilter is really fast, as fast as DocFetcher and Recoll: almost instant results. So, there is no problem for dealing whit thousand of files. But there is a problem with those three indexing tools.

iFilter displays only files, without any means to go the right location in those files. You have to open each file in Editor and redo the search.

DocFetcher goes a step further. If you click on a result, a plain text version of the PDF is displayed. From there, we can display each occurrence of the search expression one at a time. The problem is that DocFetcher is not able to open the PDF in the right place where the expression is, as it is possible with the Search Pane in Editor. At least, we have the context for each occurrence, which allows, after opening the PDF, to find the place of a particular occurrence. But it is awkward.

Recoll goes a step further. At least Recoll can display the results in a way similar to the Search Pane. But it is a two step process. First, Recoll displays a line for each PDF found. After that, you have to open a window to see all the expression occurrences for each file. Thanks to Recoll, when clicking on an occurence, the PDF opens at the right page. But, there is a but, the expression is not highlighted in the page, while Editor does it. So you lost time trying to spot the expression on the page. Generally, we end up redoing the search for this expression on this page with Editor. It is awkward.

For several months, I use only Recoll for full text search.

I confess I do not understand why the Search Pane does not use iFilter when we do a full text search. Surely because of technical problems! Otherwise, Search Pane is almost perfect (Ah, yes, it lacks a regular expression search feature. But I think there is a ticket open for this). He can search the entire drive (search in C:\ with sub-folder search activated). Results are display by file then by occurrence within each file. When clicking an occurrence, the PDF opens in the right place whit the expression highlighted on the page.

Marketing

  • Maybe standish-001 idea is good. The Tag plugin should work with the free Editor version
  • As I said before, people interested in a AMM are generally not interested in other features to handle PDFs.
  • So, it's unlikely that these people will buy a program as advanced as Editor so you can buy a plugin later.
  • It is better to sell the plugin at a higher price for those not having a licence for the paid version of Editor.
  • It can be a good idea to have a page dedicated to the plugin. The idea is to mimic a standalone software. From that page, we can download free Editor bundled with the plugin already installed and activated, with "Research Optimized UI", which means to bring to the forefront the Tag Pane, the Search Pane, the Comments Pane, etc. and Ribbons optimized for search and annotations.
  • The bundle can be called XChange Annotator
  • In fact, if I had the resources, and if it was possible to have an agreement with Tracker, I would develop the plugin with its own website, something you mention in your last post if Tracker is not interested.
Now, if I summarize, it would be enough to add a Tag Pane, optimized Ribbons, and possibly iFilter hooks to the Search Pane.

François
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6829
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: Summarzing comments, extracting highlighted text

Post by Paul - Tracker Supp »

I really appreciate the thought and dedication you have all put into this, unfortunately I am increasingly in agreement with the development team, there is a lot of work to be done in full blown AAM feature, and to be honest, the potential revenue while difficult to accurately predict, does not look as promising as I had hoped.

If this were to be done as a plugin, a third party developer would need to make it as the decision has been made that we won't be spending resources on one. Also, any plugin would only work with a licensed version of the Editor, not tan unlicensed or "free" Editor. That is true for any and all third party plugins. The plugins you already see are all created internally and all provided with the Editor.

What I have managed to get agreement on is a simplified version of tagging, using a semi-colon as a separator in the Subject of the annotation. It should include:
  • Tags presented like gmail when composing, a drop list that gets filtered as you type and is presented as a object with a border and a "remove tag" x in the corner.
  • Selecting multiple tags progressively narrows the results
  • Collections of target search folders
  • Indexing the search for improved performance
Sounds good to me, if a compromise. The downside? The priority of such a feature will remain low until/unless something unexpectedly compelling precipitates a change in this decision. It could literally be years coming. Nothing we've discussed here has convinced the powers that be the return is certain to be worth the effort. And I have to bow to their experience, both is understanding the true scope of the work as well as choosing the most demanded features.

We do not take customer requests lightly and have, literally more than 500 unactioned formal "Feature Requests" and many of these will never see the light of day. So, interestingly, we are in the process now of trying to separate the wheat from the chaff so to speak in those. To that end you may see in time a section on the website where Feature Requests can be voted on, if AAM get the votes this may change.

Until such a thing happens and we see the appropriate demand for this I am afraid no amount of discussion here is going to change that business decision.

I did my best, I thought this would be less complex and more popular. I am not always right, though I'd prefer nobody tell my wife that...

Feel free to add comments to the discussion, they can be referenced if and when the time comes that we action this, but in the meanwhile we are essentially going to step away from it. Keep an eye out, in the coming months, for a way to vote on your desired new features. I can post here when it happens if you like.

Regrettably

Paul
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
francois maurice
User
Posts: 100
Joined: Sat Sep 29, 2012 5:38 am

Re: Summarzing comments, extracting highlighted text

Post by francois maurice »

Thanks a lot Paul,

I understand the situation.

Sorry gang, you are stuck with me for the development of a basic annotation management system.

I will therefore, in the coming months, continue the development of ASuLiB. You can join the discussion at the ASuLiB discussion thread:

viewtopic.php?f=62&t=30582

I will post a first update this summer.

May the adventure continue!

François
Peter28
User
Posts: 14
Joined: Sun Nov 23, 2014 7:14 pm

Re: Summarzing comments, extracting highlighted text

Post by Peter28 »

Dear Patrick, Daniel and the rest of the Tracker team,

I started this discussion in Nov 2014 with the (simple) feature request to extract highlighted text from a (big) pdf file (e.g. book) with one click into an edible format (not pdf) without any additional information (e.g. pages, authors, etc.)) (viewtopic.php?f=62&t=22846#p88209). A feature request for a need (efficiently summarize the knowledge out of a pdf-file into a compact format for future reference) that, apart from Sumnotes (https://www.sumnotes.net/viewer.php) which is unfortunately an online service, no one on the market had a good solution for. A feature of which I claim every student on this planet (or everybody else who wants to make his knowledge management system more efficient) needs. Since my original post many users have supported my request. Two and half years later, on May 2017, Vee (viewtopic.php?f=62&t=22846#p113193) uttered her frustration by the lack of progress , which was balm for my soul (Thank you Vee!).
Now 4.5 years have passed and where do we stand? Honestly, I lack the words to express my frustration here. Also, when I read some of your replies, I keep asking myself what your (product) strategy is. How do you intend to differentiate/compete with the competition?
In my case…
a) I can't use PDF XChange Editor Pro to extract highlighted text in the way I need to
b) I can't annotate pdfs by hand because of the (terrible) time lag of your application with my stylus/pen (I had to buy PDF Annotator because of this)
c) I use Abbyy's FineReader for OCR. (I don't believe that PDF XChange Editor's OCR will ever be able to compete with the quality and efficiency of FineReader's OCR).

Taking the rest of PDF XChange Editor's other features into account, I don't see much differences/advantages compared to other PDF software.

Thus, please tell me why I (or anyone else) should buy (a new version) PDF XChange Editor Pro?

Thanks for listening,

Peter
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6829
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: Summarzing comments, extracting highlighted text

Post by Paul - Tracker Supp »

Hi Peter,

thanks for the input. Indeed this has been a very long and interesting discussion. Progress is being made. RT#4304 is closed, the work done and ready for the next build. Here is a preview of what the Summarize Comments dialogue will look like in the next build:
image.png
Your point
Now 4.5 years have passed and where do we stand? Honestly, I lack the words to express my frustration here. Also, when I read some of your replies, I keep asking myself what your (product) strategy is. How do you intend to differentiate/compete with the competition?
is a good one, and I would respond by saying that our strategy is to develop the product(s) in the most cost effective manner possible. That means the returns on the development costs must be apparent. With the best will in the world, and I mean this because I personally want to see Advanced Annotation Management, while there are 9 very vocal users contributing to the discussion, the work involved in creating a comprehensive and usable solution is significant and the potential return questionable. I take your point that there are 300,000 Liberal Arts students each year:
According to one bit of research, around 300,000 students graduated in 2015 with degrees in Liberal Arts. Most of those had to highlight LONG PDF documents and...the highlights and comments are, for all intents and purposes, unmanageable. Seems like you've got a huge market.
however I would have to say that when compared to the number of Engineers wanting 3d, or improved OCR which some sources put in the tens of millions, this number seems less encouraging. Those are examples of areas where we have focused our energies recently.

I mention the Enhanced OCR specifically as it is one of the areas you mention you use another product. We are actively working on this and in my opinion, it gets better every build.

Regards the lag in the stylus, we are actively working on that also. On discussing this with the development team, we would appreciate feedback on our progress. Anyone who wants to test the progress please write to support@pdf-xchange.com and request the Stylus Test Portable version of the Editor.

Yes, these features are slow in coming, there is good reason for it. At the end of the day we are still in the place where the cost to implement some features outweighs the expected return.

So to answer your question directly, I believe the software represents outstanding value for money and has a broad appeal to many types of PDF users due to it's ease of use, plethora of features, excellent performance, and (in my opinion) great support.

Tags in annotations are still likely to happen, even if still some time away. With respect I like to think we are navigating a reasonable compromise between the needs of the many and the needs of the few. At least that is the intention.

I hope that helps.
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6829
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: Summarzing comments, extracting highlighted text

Post by Paul - Tracker Supp »

Hi again Peter,

we are making progress on the Stylus issue. May I ask you to test a pre-release "Portable" build for the stylus performance and give us feedback? Please contact us by email at support@pdf-xchange.com to get a download link.

Being a "Portable" version you need not install it, just extract the zip file to a folder where you have read access and launch the PDFXEdit.exe executable inside.

We are keen to hear if this iteration shows significant improvement.

regards
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
Peter28
User
Posts: 14
Joined: Sun Nov 23, 2014 7:14 pm

Re: Summarzing comments, extracting highlighted text

Post by Peter28 »

Dear Paul,

It's been indeed a very long discussion. Whereas I wish that my wishes would be implemented or implemented quicker, I have to say that your quick and considered replies always made me feel listen too. I thank you for that.
I am aware that I have a limited view with my request. You guys have the big picture and should act accordingly, which I assume you do. Of course, you have to make sure that you get a proper return on the development costs. I understand.

I haven't done any market potential calculation concerning my request (btw, the "… 300,000 Liberal Arts student…" quote doesn't come from me). However, I believe the market potential might be much higher than you think. Myself, for instance, I am not a student. I am a 45 year old engineer. My feature request bases on my wish to build up an efficient personal knowledge management system. I read a lot. The problem I have is to remember all what I read and make it quickly accessible for future reference. When I read a book in the past, I made (handwritten) notes, which is a lot of (copy down) work. Now, I scan the book ( PDF) before I read it, which also takes a lot of time. Then I read and highlight the (book-)PDF on my computer or tablet. Then I copy-paste (manually) the highlighted text, which takes again a lot of work, into Word where I edit it into a bullet-point-summary. Or, I copy-paste (manually) the highlighted text into a "knowledge category" mind map, which contains also the knowledge from other sources. A software tool which would allow me to extract my highlighted text (with one click after I'm finished with the book) and if possible in a way (see my post viewtopic.php?f=62&t=22846#p88271) to make it easier for me to structure it afterwards ( summary), would help me a great deal.
So far, I couldn't find a software for my use case that would make me happy. Dedicated knowledge management software such as Citavi didn't convince me. And it looks like that I am not the only one with that need. The wish to extract highlighted text (not the added comments) from a PDF is quite old already (10 years?) - you can google it. We all have to continue learning and, ideally, keep the knowledge we read - somehow. For a software that helps people getting better and more efficient should be enough market potential out there, I believe.

Best regards,

Peter


PS:
Concerning your stylus test request, I'm glad to help. I'll send you an e-mail to support@pdf-xchange.com.
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6829
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: Summarzing comments, extracting highlighted text

Post by Paul - Tracker Supp »

Hi Peter,

all this is, to my mind, great stuff. Identifying where a potential market is for software may be more black magic than science, however I believe this discussion is extremely helpful and very productive, even if the end result is not what you (or I for that matter) expected. To that end, this discussion can potentially bear fruit hither to unexpected...

regards testing the stylus tool, we have already determined that the test build requires more work, we are making a new one. If you write again to support@pdf-xchange.com again and mention this fact, we will notify you by email when the next one is ready for testing.

regards
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
Puffolino
User
Posts: 317
Joined: Wed Feb 09, 2011 1:06 pm

Re: Summarzing comments, extracting highlighted text

Post by Puffolino »

Hi, I'd like to ask if some small improvements could be done for the great summarizing feature:

1. when a new comment is edited (for instance when it has been newly generated), the text content won't be seen in the summarize list.
2. would like to change the format/size of the comment information text and comment itself individually
3. when no comment information is selected (what I do use often), the pointing line do not look fine (and have different distances to the isolated icons)

The following example file shows four comments...
* the third has been created and filled with a lot of text before I executed "Summarizing comments..."
* the first pointing line has a large distance to the icon than the following lines (the pointing line should lead to the center of the icon)
* when no comment info (type, author,...) is printed, the icon itself could be lowered to the separation line
PDF_VE (Kommentare).pdf
(24.71 KiB) Downloaded 48 times
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6829
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: Summarzing comments, extracting highlighted text

Post by Paul - Tracker Supp »

Hi Guys,

I am going to have to admit that this feature request is not enjoying high priority. We have a huge amount on our plate and there are just so many other things that are seen as a needing the available resources.

The discussion is monitored, hopefully this will eventuate, but for now this is on the back burner I am afraid.
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
Post Reply