Word frequency

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Paul - Tracker Supp, Chris - Tracker Supp, Tracker Supp-Stefan, Ivan - Tracker Software

Post Reply
GBS
User
Posts: 4
Joined: Sun Apr 30, 2017 8:22 pm

Word frequency

Post by GBS » Thu May 11, 2017 8:06 pm

Hello,

Every day I use this program it gets more into the purchase direction, I feel like there is no escape :lol:

Is it possible to have a word frequency or at least word count? Or does it have already and I didn't found?
Some journals require me to limit the words (including spaces) in abstract, witch is easy when I produce my own documents but sometimes I get to review from others.
By word frequency I mean a list of all the words (witch could include spaces like for short phrases) present in the PDF together with the its occurrences (number of times they show up). It seems rather simple algorithm but I couldn't find any pdf reader that does so.

User avatar
Patrick-Tracker Supp
Site Admin
Posts: 1668
Joined: Thu Mar 27, 2014 6:14 pm
Location: Vancouver Island
Contact:

Re: Word frequency

Post by Patrick-Tracker Supp » Fri May 12, 2017 12:13 am

Hello GBS,

Thank you for the post. I am afraid this is not something we offer, however you can search a specific word to find how many times it occurs within the document. You can activate the advanced search via [Shift]+[Ctrl]+[F] or under Edit> Search.

I hope this helps!
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Cheers,

Patrick Charest
Tracker Support North America

Joxon
User
Posts: 29
Joined: Sat Sep 12, 2015 4:54 am

Re: Word frequency

Post by Joxon » Fri May 12, 2017 2:31 am

For a simple word count you can use the JaveScript Console: Ctrl+J. Enter:

Code: Select all

var cnt=0;
for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);
app.alert("There are " + cnt + " words in this file.");
Click Run.

User avatar
Bhikkhu Pesala
User
Posts: 1774
Joined: Tue May 29, 2007 9:29 am
Location: East London
Contact:

Re: Word frequency

Post by Bhikkhu Pesala » Fri May 12, 2017 4:07 am

Joxon wrote:For a simple word count you can use the JaveScript Console: Ctrl+J. Enter:
That's worth knowing. Can it be made to work for the selected text?
Windows 10 64-bit • AMD A10-6800K, 8 Gbyte RAM
Review: http://www.softerviews.org/PDF-XChange.html

User avatar
Will - Tracker Supp
Site Admin
Posts: 6905
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Word frequency

Post by Will - Tracker Supp » Fri May 12, 2017 9:00 am

Hi Bhikkhu,

From what I can see, there isn't any way to get the selected text via JS. One potential workaround is to use the Highlighter Tool to highlight a block of text and get the word count of that instead. I was able to find this article, which should provide a good starting point:
http://asserttrue.blogspot.co.uk/2010/0 ... -lack.html#

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

User avatar
Will - Tracker Supp
Site Admin
Posts: 6905
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Word frequency

Post by Will - Tracker Supp » Fri May 12, 2017 9:14 am

Bhikkhu - thinking about it, can you provide a description of why you'd need that to work with selected text? If I know the exact use-case, I might be able to adapt the JS to do as you need and provide it as a toolbar button & the regular script.

I emphasize the word may because, as I believe you have seen in other topics, our official policy is that we don't provide users with JavaScripts, the onus is on them to write them. We make exceptions if the script is useful to a large number of users and is quick/simple to write. I'm also not hugely proficient in JS, so it may be beyond my ability to write.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

User avatar
Bhikkhu Pesala
User
Posts: 1774
Joined: Tue May 29, 2007 9:29 am
Location: East London
Contact:

Re: Word frequency

Post by Bhikkhu Pesala » Fri May 12, 2017 12:30 pm

I did not have any specific use case in mind, but I think it would be more useful than counting the words in the entire document.

Alfred on CommunityPlus suggested this improvement, which I will pass on: The Error Icon should be replaced with an information icon.

Code: Select all

var cnt=0;
for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);
app.alert("There are " + cnt + " words in this file.",3);
Windows 10 64-bit • AMD A10-6800K, 8 Gbyte RAM
Review: http://www.softerviews.org/PDF-XChange.html

User avatar
yogi108
User
Posts: 74
Joined: Thu Mar 09, 2017 9:13 am
Location: Austria, Theiß
Contact:

Re: Word frequency

Post by yogi108 » Fri May 12, 2017 6:56 pm

Hi,

As Javascript has its limitations, we can use https://autohotkey.com/ as external tool (Software License: GNU General Public License)
Just got this to run, if someone knows a better regex etc:

1.) Install the autohotkey software
2.) make one file (e.g. clipboard.ahk) with contents:

Code: Select all

!c:: ;define ALT-c as Hotkey
#NoEnv  ; Recommended for performance and compatibility with future AutoHotkey releases.
#Warn  ; Enable warnings to assist with detecting common errors.
SendMode Input  ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir%  ; Ensures a consistent starting directory.
clipboard = ; Empty the clipboard
ClipWait ; wait for clipboard change
if ErrorLevel
{
    MsgBox, The attempt to copy text onto the clipboard failed.
    return
}
MsgBox, clipboard = %clipboard%

Array := StrSplit(clipboard, A_Space, ".")  
max:=Array.MaxIndex()
MsgBox, Words counted=%max%
return
3.) Open one app (e.g. PDF-XChange Editor)
4.) press the hotkey ALT-c (hopefully not used on your side - changeable), will clear the keyboard and waits for selected text to be copied with Cntrl-c
5.) select text
6.) press ctrl-c and see what happens

Just tested with word and PDF XChange, not bad :-)
kind regards,

Edit: Here the executable compiled for testing:
clipboard.zip
Standalone for counting words
(385.27 KiB) Downloaded 45 times
Last edited by yogi108 on Sat May 13, 2017 5:23 am, edited 1 time in total.
"You cannot know the meaning of your life until you are connected to the power that created you.”
Shri Mataji Nirmala Devi, founder of Sahaja Yoga

User avatar
David.P
User
Posts: 961
Joined: Thu Feb 28, 2008 8:16 pm
Location: Germany

Re: Word frequency

Post by David.P » Fri May 12, 2017 9:07 pm

Because the "word frequency" feature request of @GBS is partially related, here's a link to a thread with a similar request, about auto-highlighting of important keywords for easier rand faster reading:

Suggestion: Revolutionary Killer Feature for Power Reading
Image
David.P
PDF-XChange Pro

GBS
User
Posts: 4
Joined: Sun Apr 30, 2017 8:22 pm

Re: Word frequency

Post by GBS » Sat May 13, 2017 6:52 pm

:shock: So it seems like there is quite some interest in this ! :mrgreen:

I've never thought about the "selected" part, I had in mind the whole file. I could use that simply by selecting all (Ctrl + A). But still not a word frequency as I see.

Wow David that's a lot more complex, because how would the program decide whats "important"?
Anyway, I really like the bottom box with colored words and their count.

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 14196
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Word frequency

Post by Tracker Supp-Stefan » Tue May 16, 2017 10:11 am

Hello GBS,

As mentioned above - JS can give you a word count, but for frequency and more complicated text manipulation and analysis it will be better if you use a proper text Editor. We offer a feature to export the PDF file to e.g. a Word format - so you can then use all the tools offered there to analyze the text as needed.
Given the complexity of PDF files, and their varied uses - I can't promise you if or when we might add a similar feature in the Editor, so for now it's best to export the text and analyze it in another tool.

As for David's suggestion - it is indeed a lot more complicated, and I again can't make any promises as to if or when such functionality might be available in our products.

Regards,
Stefan

User avatar
David.P
User
Posts: 961
Joined: Thu Feb 28, 2008 8:16 pm
Location: Germany

Re: Word frequency

Post by David.P » Tue May 16, 2017 10:19 am

Hi @all,
GBS wrote:Wow David that's a lot more complex, because how would the program decide whats "important"?
It's not that complex, and I already had it implemented in the Firefox browser (see discussion here). For a start, you simply use the words that are used most in any given article or text, and exclude common words like and, is, etc. by using a simple (static) list of stop words. The tool works very good this (simple) way.

I.e., what you do is, you make a simple list of all words in a given article, subtract all stop words from the list, sort the remaining words by frequency, and then use like the 12 most prevalent words for a normal Search operation in PDF-XChange Editor.

That's about all this feature would need to be implemented :)

Cheers
David
David.P
PDF-XChange Pro

User avatar
Will - Tracker Supp
Site Admin
Posts: 6905
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Word frequency

Post by Will - Tracker Supp » Fri May 19, 2017 9:14 am

Hi David,

That does sound fairly complex and things are often more complex programmatically than they would at first appear. We'll obviously consider the feature, but it's likely a fairly low priority consideration at the moment.

Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

User avatar
David.P
User
Posts: 961
Joined: Thu Feb 28, 2008 8:16 pm
Location: Germany

Re: Word frequency

Post by David.P » Fri May 19, 2017 10:27 am

That's understood Will. Thank you. The discussed feature of course would be more like a freestyle/luxury feature for advanced text analysis.

If someone wants to try a "predecessor" of that feature manually, here's a great online text analysis tool:
Textalyser

This tool gave gave me the following keywords list (with number of occurrences) from my attached sample e-book "Emma" (saved as *.txt from PDF-XChange Editor):

harriet 491
knightley 385
woodhouse 306
fairfax 237
churchill 221
hartfield 152
highbury 122
pleasure 115
morning 107
feelings 98

If I use these terms:
harriet knightley woodhouse fairfax churchill hartfield highbury pleasure morning feelings
...for an Advanced Search in PDF-XChange Editor ("Find text with any of these words"), then I already get a very nice colored highlighted overview about what this book actually is about and where those keywords are concentrated/distributed:
Image

Anyway, this is of course a wide field. Strangely however, there still seems to be no software available whatsoever that can do what my original feature request describes (and partly delivers).

Cheers
David
:)
Attachments
Sample e-Book 'Emma'.pdf
(994.01 KiB) Downloaded 52 times
David.P
PDF-XChange Pro

User avatar
Will - Tracker Supp
Site Admin
Posts: 6905
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Word frequency

Post by Will - Tracker Supp » Mon May 22, 2017 11:02 am

Thanks for the details, they've been noted and we'll put it in for consideration :)
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com

Post Reply