I am pleased to introduce ASuLiB, a rough but modern and effective way
to handle multiple topics associated with annotations.
ASuLiB originates from a post that I published on January 20, 2018.
At the time, I was reporting a glaring lack of functionality to handle annotations,
especially if we wanted to associate several topics to an annotation. Here's
the link to that post:
Tracker was interested enough to create a feature request ticket in his internal system.
Here's the number and the title of this request if you want to participate and give some
ideas about an advanced and modern annotation management system:
- #4209: Editor FR: 'Annotations' plug-in with extended features!
If Tracker has no objection, I will dedicate this post to collect your comments about ASuLiB.
I will also release updates, especially to fix imperfections and bugs.
In what follow, I will use topic, subject, tag and keyword as synonymous.
OVERVIEW
You may have wondered why it was impossible to associate
multiple topics with annotations. It is rare in fact that
an annotation refers to only one subject. Whether we come
from a corporate or academic milieu, when we are working
on a report or a paper we have to juggle with several topics. To my
knowledge, PDF software that closely follows PDF standards do
not offer this ability to handle multiple topics at once and
one annotation at a time, and software that offer such a
possibility, such as Qiqqa, do not follow PDF standards.
In order to be clear, in XChange Editor, you create a subject by opening
the properties pane of an annotation. You can open the properties pane by
right clicking on an annotation and selecting "Properties...". The first
property in the property pane is the subject property. You type the
desired properties inside the subject field.
The management features of comments have hardly changed
since the late 1990s. Comment management is based on fairly good
collaboration tools but on a primitive way to get an overview of
comments, the famous "Summarize Comments". This latter feature
manages comments based on a sorting of topics and not on the basis
of multiple topics search.
For example, you have the following two subject property fields from
2 different annotations (annotations can be of any type since subject
property is a property common to all types of annotation). The first
field is filled with 3 subjects and the second field with 2 subjects.
Subjects are separated by semicolons :
- First Annotation: universal proposition; Popper; testability
- Second Annotation: testability; scientific truth
(you may think of an annotation as an index card with multiple subjects
written on it; yes, I'm old but I see that we can mimic, in a better way,
the good old index card system!)
If you use Summarize Comments feature and sort by subject, you will get
a new PDF file with a list of comments. The second comment up there
will be first in that list and the first comment will be second.
So sorting treat the subject field as a single string. There is no way to
get only those comments with, say, 'scientific truth' or only those
comments with 'scientific truth' AND 'Popper', etc.
It is also impossible to get a list of all subjects from all annotations
of all PDF files in a given folder, which would be the bare minimum. For
example, it is impossible to have the following list form the above
subject field properties:
- Popper
- scientific truth
- testability
- universal proposition
Moreover, it is impossible to make a "Search and Replace Subject" across all
subjects from all annotations of all PDF files in a given folder.
This is a most unpleasant situation since the non existence of an advanced
annotation management system means that we have to work both on
screen and in print. If we had an advanced annotation management
system, we would not have to print PDF files or comment summaries.
So, since we do not have such a system for now, I programmed this
script so that at least we would have a list of all our topics. This list
allows at least to track subjects. I also programmed some basic statistics,
such as the total number of annotations, the total number of PDF files
processed, and so on.
I did not try to optimize the script but the performance of the script
is very good.
I have a folder containing 598 PDF files for a total of 3.5 GB
(yes, I put all my PDF files in one folder and use Zotero, a free
reference manager, to manage these files). A first fresh run took
2 minutes 10 seconds and a second run took 1 minute (maybe it is due
to memory cache issue).
Here's the stats for my PDF folder:
- Total number of PDF files processed : 598
Number of PDF files with annotation : 140
Number of PDF files without annotation : 458
Total number of annotations : 5757
Number of annotations with subject : 3930
Number of annotations without subject : 1827
Number of unique subjects (without duplicates): 291
Number of subjects occurrences (with duplicates): 4453
There is three codes below after the installation section.
- The first code is for the main JavaScript file.
- The second code is for the VBScript file.
- The third code is for the batch file.
file and the batch file must be modified.
The process can seems tedious but it is simple. The VBScript file
is modified only once and the batch file is only modified only if
you add more folder locations after the first setup.
But before looking at the installation instructions, let's see what the
list generated by ASuLiB can serve.
UTILIZATION
After generating the list of topics using ASuLiB, you can
use this list to perform different tasks.
A first important task is to clean the subjects. Since we do not have
a system for updating the list in real time (so we need to generate the
list from time to time), we can enter subjects in slightly different forms.
For example, you may have entered the following subject in two different
forms: anti-realism and antirealism. The list will then reveal both forms. You
can then correct the entries with the form you have chosen.
How to find these forms? Here, XChange Editor has a quite powerful search
feature (I say quite powerful because it could be more powerful). If you press
Ctrl-Shift-F, you open the Advanced Search Panel.
Click on the "Advanced Criteria" radio button. In the "Find text with:" section,
you can specify topics to search for. In the "WHERE would you like to search?" section,
you can specify the path of a PDF file folder. Finally, click on "Options ...".
A menu will open, divided into 7 sections.
To search for comments only, not a full-text search for example, in section 2, you need
to uncheck all options except "Include Comments". Finally, in order to be able to search
for topics that are not adjacent, click on "Proximity" and then on "Words form the Same Paragraph".
Let's go back to our example. Let's say you chose to use the form "Anti-Realism". Then,
in the field "all of these words" from "Find text with:" section, enter "Antirealism" to find it
among all your annotations and be able fix it. Choose your folder then click "Search...".
XChange Editor displays the results of the search in the bottom section. Each file that
contains annotations with a topic "Antirealism" is displayed. On the
left is a small arrow that reveals the annotations found for each file. Click on this small
arrow and then click on an annotation. Miracle! The file opens almost instantaneously directly
on the annotation.
If the properties panel is not yet open, open it by right-clicking on the annotation and
selecting "Properties" at the bottom. The first property in the panel is "Subject". You can make
your correction then press "Enter" and Ctrl-S.
You'll probably guess that another way to use the list is to search for annotations (index cards !!)
on one topic or on multiple topics at once. With the advanced search panel, the
search possibilities are immense. Here's a tip: To identify a project, such as a writing
project, find a code name for that project and enter that code name as the subject for
all annotations that are relevant to that project.
You begin to understand that an advanced annotation management system would offer
even more powerful features.
Look forward to taking a step towards the future. So follow the instructions that follow.
INSTALLATION
A) Overview
As implemented at this time, ASuLiB relies on 4 files:
1) the main file is a JavaScript file which generates the list
of subjects and statistics;
2) there is a batch file that generates a list of PDF file names
to process;
3) and there is a VBScript file whose purpose is to make sure that the
batch file works in silence (in the background)
4) A text file which contains the PDF filenames to be processed
ASuLiB as such is saved in a javascript file named
"ASBL-Annotation_Subjects_ List_Builder.js". This file
has to be put in a specific folder so that XChange Editor read
the file at startup.
Since I am not able to program in JavaScript in such a way that
I can get the filename of all the PDF files to be process in a given folder,
we must generate a .txt file that contains those filenames. The text
file is named "_aslbFilenames.txt" and is generated automatically by
the bath file.
The generation of the text file is done automatically at Windows startup
by the batch file. The batch file is named "_aslbFilenames.bat". In the
batch file you specify the folder(s) where are the PDF files
to be processed. So you have to put the batch file in the Windows folder
Startup. Don't be worry about this process at startup. It is ultra fast,
less then one second.
You must also put the VBScipt file in the Windows folder Startup.
Once everything is in place, you start XChange Editor. Editor reads the
JavaScript file. You should see a toolbar or ribbon named
"Add-on Tools" (if you don't see it, activate it by right clicking
anywhere in a toolbar or a ribbon and select "Add-on Tools"). There is
a new button (with the default piece of puzzle icon) called
"ASuLiB - List".
If you click on "ASuLiB - List", a dialog ask you to choose a folder.
Once the folder is selected the process of generating the subject
list will start.
You know when the process is complete when the report, in PDF format,
appears on the screen in XChange Editor. During the operation,
a progress bar appears on the screen. If you do not have many files,
the bar will appear and disappear very quickly, to the point that you
may not notice it.
B) Detail instructions
1) Copy the first script in a text file and save it in the following folder:
C:\Program Files\Tracker Software\PDF Editor\JavaScripts
with the following name:
"ASBL-Annotation_Subjects_List_Builder.js"
Warning: If the folder "JavaScripts" does not exist, create it
(beware it is case sensitive so you have to write "JavaScripts")
2) Copy the VBScript in a text file and save it in the following
Windows startup folder:
C:\Users\YOUR_USER_NAME\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup
Replace "YOUR_USER_NAME" by your user name.
Use the following names for the VBScript file:
"_aslbFilenames.vbs"
Now, copy batch script in a text file and save it in any folder you want.
(it does not matter where the bath file is)
Use the following names for the batch file:
"_aslbFilenames.bat"
3) Open the VBScript file "_aslbFilenames.vbs" in a text editor by
right clicking on the file and then choose a text editor like Notepad.
(Personally I use Notepas++ which allows you to color the code
if you choose the right programming language in the Language menu.)
Change the path to point to the folder where you saved the batch file.
4) Open the batch file "_aslbFilenames.bat" in a text editor
In the batch file, you can specify as many paths as you want.
In other words, you can create as many path as you have PDF folder.
Each path specification has two lines as follow:
cd /D C:\THE\PATH\TO\A\PDF\FOLDER\TO\BE\PROCESSED
dir /B> _aslbFilenames.txt
You have to specify a path in the first line and there is nothing
to change in the second line.
The command "cd" is for "change directory". So the batch file go to
the specified folder. The parameter /D is to ensure that you can
drive and not only folder.
The command "dir /B>" is a composite command. The "dir" command
generate a directory listing.
The "/B" parameter specify to list only filenames (not folders
or any other information).
The ">" command redirect the output to the text file
"_aslbFilenames.txt" instead to the default console output.
So both lines together generate a text file with all the filenames
from the files in the folder and save it in the same PDF folder.
5) Now you can restart Windows so that the VBScript file and the batch file
generate all your text files named "_aslbFilenames.txt", one per
specified folder.
6) Then Start PDF-XChange Editor
Editor reads the JavaScript file. You should see a toolbar or
ribbon named "Add-on Tools" (if you don't see it, activate
it by right clicking anywhere in a toolbar or a ribbon and
select "Add-on Tools"). There is a new button (with the default
piece of puzzle icon) called "ASuLiB - List".
If you click on "ASuLiB - List", a dialog ask you to choose a folder.
Go the a PDF folder and click open.
Once the folder is selected the process of generating the subject
list will start.
You will notice that the dialog filename field is already filled
the text file filename "_aslbFilenames.txt". Don't be worry if you
click on PDF file and the filename change. The code catch the error
and change for "_aslbFilenames.txt" in the background. So just
click open. (In fact, it is a dialog to choose a single PDF file,
but I tweak the code so the filename field is already filled with
the filename "_aslbFilenames.txt". You can click on a PDF file if
you want to make sure the folder is activated, then click open).
You know when the process is complete when the report, in PDF format,
appears on the screen in XChange Editor. During the operation,
a progress bar appears on the screen. If you do not have many files,
the bar will appear and disappear very quickly, to the point that you
may not notice it.
1) First code filename: ASBL-Annotation_Subjects_List_Builder.js
Code: Select all
/* ASuLiB 1.1 (Annotation Subjects List Builder 1.1)
François Maurice, 2018
Free to use as it is
I am not responsible for the use of the ASuLiB script
or any damage it may cause
This script was not tested on any PDF software other than
PDF-XChange Editor
Tested using PDF-XChange Editor Pro 7.0.324.3 on Windows 10.0.16299
I do not commit myself to implement new features
but I will do my best to correct any imperfections and bugs
To be honest, if no existing software develops an advanced
and modern annotation management system, I will probably,
within a year or two, develop a feature to search and
replace topics
*/
/* ACKNOWLEDGMENTS
I want to thank the Tracker team for showing great patience
and for giving me hints on how to solve my problem.
I also want to thank all the programmers who leave bits of scripts
everywhere on the Internet. Without these scripts I would not have
been able to program ASuLiB as quickly.
Finally, I used as main reference the following books:
JavaScript: The Definitive Guide (2011) by David Flanagan
Developing Acrobat Applications Using JavaScript (2006)
from Adobe Systems
JavaScript for Acrobat API Reference (2007)
from Adobe Systems
*/
// BEGINNING OF THE SCRIPT
// TRUSTED FUNCTIONS
// Method "app.openDoc" necessitates security privileges
aslbTrustedOpenDoc = app.trustedFunction(
function (path, hiddenfile)
{
app.beginPriv();
//Get a file from folder
var myOpenDoc = app.openDoc({cPath: path, bHidden: hiddenfile});
app.endPriv();
return myOpenDoc;
});
// WRITE ANNOTATION SUBJECTS TO A PDF REPORT (with additional stats)
// All the code is enclosed in a pseudo-function which will be called by
// a button from the GUI
function aslbList()
{
delete global.startBatch;
if ( typeof global.startBatch == "undefined" )
{
global.startBatch = true;
// When we begin, we create a blank doc in the viewer to hold the
// attachment.
global.myContainer = app.newDoc();
// Ask th user to choose a folder where to find the text file
// "_aslbFilenames.txt"
// The filename "_aslbFilenames.txt" is already filled by default
var oRetn = app.browseForDoc({cFilenameInit: "_aslbFilenames.txt"});
//var oRetn = app.browseForDoc();
if ( typeof oRetn != "undefined" )
console.println(oRetn.cPath);
else console.println("User cancelled!");
// The browseForDoc method return a full path which can cause a problem
// if the user click on a PDF file while choosing a folder where to
// find the text file "_aslbFilenames.txt"
// So that snippet eliminate the filename from the file path
var lastSlashPosition = oRetn.cPath.lastIndexOf("/");
var stringPath = oRetn.cPath.slice(0, lastSlashPosition + 1)
console.println(lastSlashPosition);
console.println(stringPath);
// Import a file as an attachment to hold all the filenames from a folder
// Since parameter cDIPath is not specified, the user is prompted to
// locate the file
// The file was created via DOS prompt with the name "_aslbFilenames.txt"
global.myContainer.importDataObject({
cName: "_aslbFilenames.txt",
cDIPath: stringPath + "_aslbFilenames.txt"
});
// Get the file stream object of the embedded text file
var aslbFilenamesContent = global.myContainer.getDataObjectContents("_aslbFilenames.txt");
// Convert to a string
var asblFilenamesContentString = util.stringFromStream(aslbFilenamesContent, "utf-16");
// Print the content of _aslbfilenames.txt to the console
console.println(asblFilenamesContentString);
// Initializing variables
var dataLine = "";
var annotsSubject = "";
var annotsSubjectSplit = "";
var myOpenDoc = [];
var totalNumberFiles = 0;
var totalNumberAnnots = 0;
var countEmptyAnnotsFile = 0;
var listAnnotsFile = "";
var listEmptyAnnotsFile = "";
var countEmptySubjects = 0;
var countSubjectOccurrences = 0;
}
try
{
// CREATING THE PATH ARRAY
// Split the string "asblFilenamesContentString" into an array
asblFilenamesContentSplit = asblFilenamesContentString.split("\r\n");
// OPEN EACH FILE AND EXTRACT SUBJECTS
// Add a progress bar
var numPDFFiles = asblFilenamesContentSplit.length
var t = app.thermometer; // Acquire a thermometer object
t.duration = numPDFFiles;
t.begin();
for (var k=0; k < asblFilenamesContentSplit.length; k++)
{
// Skip filenames which are not .pdf or .PDF
if (asblFilenamesContentSplit[k].slice(-4) != ".pdf" &&
asblFilenamesContentSplit[k].slice(-4) != ".PDF") {continue};
// Call trustedFunction js file
myOpenDoc = aslbTrustedOpenDoc(stringPath + asblFilenamesContentSplit[k], "true");
//myOpenDoc = aslbTrustedOpenDoc(stringPath + asblFilenamesContentSplit[k]);
// Display the progress bar
t.value = k;
t.text = "Processing PDF file " + (k + 1) + ": "
+ myOpenDoc.documentFileName;
if (t.cancelled) break; // Break if the operation is canceled
// Wait until all comments have been scanned.
myOpenDoc.syncAnnotScan();
// Sort through the comments, adding tab-delimited lines to data string.
var annots = myOpenDoc.getAnnots({});
if ( annots != null )
{
for (var i=0; i < annots.length; i++)
{
totalNumberAnnots++;
// Skipping annotations without subjects
if (annots[i].subject.length == 0) {
countEmptySubjects++;
continue;
};
// Collect the string from every annotation's subject property
annotsSubject = annots[i].subject;
// Split the string into an array so that multiple subjects from one
// annotation become an independent string
// If you use another subject delimiter instead of ";", change it here
annotsSubjectSplit = annotsSubject.split(";");
// Leading and trailing whitespace removed
// Every subjects from all annotation are put on separate line
for (var j=0; j < annotsSubjectSplit.length; j++)
{
annotsSubjectSplit[j] = annotsSubjectSplit[j].trim();
dataLine += "\r\n"+ annotsSubjectSplit[j];
}
}
listAnnotsFile += "\r\n" + myOpenDoc.documentFileName;
} else {
console.println("The document " + myOpenDoc.documentFileName + " contains no annots.");
countEmptyAnnotsFile++;
listEmptyAnnotsFile += "\r\n" + myOpenDoc.documentFileName;
}
myOpenDoc.closeDoc();
totalNumberFiles++;
//delete global.startBatch;
//global.startBatch = true;
}
t.end();
// Splitting string into an array and sort subjects in alphabetical order
dataLineSplit = dataLine.split("\r\n");
dataLineSplit = dataLineSplit.sort(function(s,t) {
var a = s.toLocaleLowerCase();
var b = t.toLocaleLowerCase();
if (a < b) return -1;
if (a > b) return 1;
return 0;
});
// Obtain a count of each subject
function compressArray(original) {
var compressed = [];
// make a copy of the input array
var copy = original.slice(0);
// first loop goes over every element
for (var i = 0; i < original.length; i++) {
var myCount = 0;
// loop over every element in the copy and see if it's the same
for (var w = 0; w < copy.length; w++) {
if (original[i] == copy[w]) {
// increase amount of times duplicate is found
myCount++;
// sets item to undefined
delete copy[w];
}
}
if (myCount > 0) {
var a = new Object();
a.value = original[i];
a.count = myCount;
compressed.push(a);
}
}
return compressed;
};
var dataLineCompressed = compressArray(dataLineSplit);
dataLineCompressed = dataLineCompressed.sort();
for (var i = 0; i < dataLineCompressed.length; i++) {
console.println(dataLineCompressed[i].value + " " +
"(" + dataLineCompressed[i].count + ")");
countSubjectOccurrences = countSubjectOccurrences + dataLineCompressed[i].count;
}
// CREATE THE REPORT
var oReport = new Report();
// Set up Report Title
oReport.style = "NoteTitle";
oReport.size = 1.6;
oReport.writeText("ASuLiB - Annotation Subject List Builder");
oReport.divide();
// Add some stats to the Report
oReport.style = "NoteTitle";
oReport.size = 1.2;
oReport.writeText("");
oReport.writeText("Statistics");
oReport.divide();
oReport.style = "DefaultNoteText";
oReport.size = 1;
oReport.writeText(
"Total number of PDF files processed : " + totalNumberFiles
+ "\r\nNumber of PDF files with annotation : " + (totalNumberFiles - countEmptyAnnotsFile)
+ "\r\nNumber of PDF files without annotation : " + countEmptyAnnotsFile
+ "\r\n"
+ "\r\nTotal number of annotations : " + totalNumberAnnots
+ "\r\nNumber of annotations with subject : " + (totalNumberAnnots - countEmptySubjects)
+ "\r\nNumber of annotations without subject : " + countEmptySubjects
+ "\r\n"
+ "\r\nNumber of unique subjects (without duplicates): " + dataLineCompressed.length
+ "\r\nNumber of subjects occurrences (with duplicates): " + countSubjectOccurrences
);
// Fill the report with subjects and counts
oReport.style = "NoteTitle";
oReport.size = 1.2;
oReport.writeText("");
oReport.writeText("Subjects List");
oReport.divide();
//oReport.writeText("");
oReport.style = "DefaultNoteText";
oReport.size = 1;
for (var i = 0; i < dataLineCompressed.length; i++)
{
oReport.writeText( dataLineCompressed[i].value
+ " "
+ "("
+ dataLineCompressed[i].count
+ ")"
);
}
oReport.style = "NoteTitle";
oReport.size = 1.2;
oReport.writeText("");
oReport.writeText("PDF files with annotation");
oReport.divide();
oReport.style = "DefaultNoteText";
oReport.size = 1;
oReport.writeText(listAnnotsFile);
oReport.style = "NoteTitle";
oReport.size = 1.2;
oReport.writeText("");
oReport.writeText("PDF files without annotation");
oReport.divide();
oReport.style = "DefaultNoteText";
oReport.size = 1;
oReport.writeText(listEmptyAnnotsFile);
// Open report into a PDF
oReport.open("ASuLiB - Subjects List");
} catch(e) {
console.println("Error on line " + e.lineNumber + ": " + e);
delete typeof global.startBatch;
event.rc = false; // abort batch
}
}
// ADD BUTTONS TO THE GUI
// The buttons are added automatically to the Add-ons toolbar or ribbon
// First button: ASuLiB - List
app.addToolButton({
cName: "Annot Subj Lst Bld",
cExec: "aslbList()",
cTooltext: "Annotation Subjects List Builder",
cLabel: "ASuLiB - List",
cEnable: true,
nPos: 0
});
// END OF SCRIPT
Code: Select all
Set WshShell = CreateObject("WScript.Shell" )
WshShell.Run chr(34) & "C:\Users\YOUR_USER_NAME\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\_aslbFilenames.bat" & Chr(34), 0
Set WshShell = Nothing
Code: Select all
REM ASuLib - Annotation Subject List Builder
REM The batch file to generate the text files which contains the PDF filenames
REM Only modify the first line of each block of two lines.
REM You specify a path on that line where to find PDF files
@echo off
REM First PDF folder
cd /D C:\THE\PATH\TO\A\PDF\FOLDER\TO\BE\PROCESSED
dir /B> _aslbFilenames.txt
REM Second PDF folder
REM Remove REM on both line below if you want to add a path
REM cd /D C:\THE\PATH\TO\ANOTER\PDF\FOLDER\TO\BE\PROCESSED
REM dir /B> _aslbFilenames.txt
REM Add as many folders as you want.
REM Just copy a block of two lines below and change the path