I'm hoping someone might have a recommendation on how to tighten up the accuracy.
Steps I'm attempting to use PDF-Tools to facilitate this process:
- Split 100-400 page batches of ballot scans into individual pages, place them into a single folder,
- Add OCR to individual pages after having been split,
- After OCR has been applied I want to be able with 100% accuracy to search the contents of PDF documents within Windows 10 for a precinct number & ballot variation.
(Example is searching: "1615-A" 1615 is the precinct number in the top right corner of the ballot and A is the variation of the ballot items for different voters within that one precinct.)
By doing this I want to find every ballot page image for that precinct to be able to move it into an individual Windows folder with contents limited to only that precinct.
Some Ballot variation are being interpreted like the following. Instead of: "2117-A" it is creates as: "2 117 -A"; & "1 4 1 9 -A" instead of "1419-A"
Some documents are searchable in PDF-XChange Editor but not in Adobe Acrobat Reader DC. I wonder if the same trouble Adobe is having trouble searching the document is the same reason Windows 10 can't find the contents of the document either.
I'm currently testing with 564 pages and I'm getting about 4 ballots each time that don't get identified out of these three large batches I'm testing with.
(Not a big deal? It is if I am trying to get the kinks worked out in preparation to handle +/-200,000 ballots, approx 4 pages each.)