Calculations on Auditing of paper ballots

Please see this for Ellen Theisen's latest on audits:
http://www.votersunite.org/info/auditingissues.asp

The top message "Corrected (Again) Cook County Audit Probabilities" is the end result of the series of emails below on audits, sent to Ellen from Ron Baiman (the ones under the top message are in chronological order and are between Ellen and Kathy Dopp; I posted the end result at the top in case people didn't want to read through the whole series):

Subject: Fw: Corrected (Again) Cook County Audit Probabilities

Most recent message:
From: "Ron Baiman"

Ellen,
Thanks much for this!

Your're absolutely right - I was aware that the normal is not a good approx to the binomial when np=5, but I hadn't thought of the probabilities going down as you pick when you have only 32 picks!

I reformulated your equation below to take the "complement of finding 0 successes" (that is corrupted precincts) out of 32 picks in each case -that is the probability of finding "at least one" corrupted precinct.

Attached are the new files. (See bottom of this post for attachment)

______________________________________________________

From Kathy Dopp (US Vote Counts:
June 19
Please review these transparent calculations in a spreadsheet done by Ron Baiman of USCV on the probability of catching precincts with miscounted votes given various assumptions about the number of precincts with miscounted votes.

http://uscountvotes.org/ucvAnalysis/US/paper-audits/

I promised that I’d get this out today because there have been a lot of election activists who have claimed (incorrectly IMO) that an audit of 3 to 5% of randomly selected precincts such as members of the OVC recommend, would not do any good, so here it is. Just download the spreadsheet in that directory.

Please take a look to see how surprisingly effective even a 2% audit can be. You can play with this spreadsheet.
In most cases all precincts would be susceptible to vote fraud if a voting machine vendor’s programs had been corrupted.

The probability of catching a precinct with vote miscounts that would trigger a county-wide recount, if the rate of vote miscounts were 10% of precincts, is 90% with just a 2% audit.
Best Regards,

Kathy Dopp
http://electionarchive.org
____________________________________________________
This is Ellen Theisen's response to the above:
June 19

What is your basis for using the NORMDIST formula for calculating the chance of finding an error? I strongly suggest that the Hypergeometric distribution formula is the appropriate one, since ballot programming can be different for each precinct (sometimes there is even more than one ballot style within a single precinct). The chances of selecting a precinct with bad ballot programming is similar to the chances of selecting a blue marble in a jar full of red and blue marbles.

In addition, as the number of precincts increases, the chances of finding at least one problem increase when you select a random sample of precincts. But most counties don’t have 2700 precincts. So what about a county with 200 precincts or less, which is more common. The chances of finding an error decrease significantly with any percentage of an audit.
For example (I’m using the Excel HYPGEOMDIST formula), in a county with 160 precincts, 10% bad precincts, and a 2% audit, the chances of finding one of the bad precincts is about 35%

You might be interested in watching this demo of the effectiveness of doing various percentages of audits.
http://www.votersunite.org/info/RandomSample.mov

It's 4MB, but if you have broadband, it doesn't take long to load. And the demo is less than 5 minutes (I think)
Ellen Theisen
__________________________________________________
And Kathy's response to Ellen's comments above:

Ellen,

Would you have the time to modify Ron Baiman's spreadsheet, which is pretty easy to understand, to use the Hypergeometric distribution formula and then shoot it back to me, and I'll post it along with Ron's and ask Ron to check it to see what he thinks and get back to you. It sounds good to me.

http://uscountvotes.org/ucvAnalysis/US/paper-audits/

I think you have the right idea, so perhaps we can make better calcs available, although it is more complex that it seems from what you say.

Or, please tell me what the Excel formula is for the hypergeometic distribution.
____________________________________________
Another email from Kathy Dopp:

Ron Baiman told me that the distribution he used to calculate probability of catching miscounted precincts is correct for more than 5 precincts, and that Ellen Theisen’s suggested method should be used when there are fewer than five precincts. He won’t have time to look at it again for a while, but he may eventually shoot something back to us. His spreadsheet was done specifically for Cook County (IL?).
elroberson@juno.com wrote:
The hand count can be right after close of the poll with public observers and announcement of votes cast and posted on the door. Hand counts are generally accurate because the procedure for hand counts is to have a team of 4; one to read and another to watch what is being read and two to
separately tally the votes.

I completely agree that the idea of hand counts is very nice and is superior to the vote counting systems in place today, but has problems:
1. We have not done hand counted paper ballots for years in America where 95%+ of our election day votes have been machine or computer counted
2. Election officials are not interested in “moving backwards” in technology. (their view - not mine)
3. Hand counting of elections is much easier to do in other countries where ballots are generally simpler and there are not so many issues, races, judges, school board races, ... and so forth on the ballot in each election.
4. Hand counting of paper ballots cannot detect ballot box stuffing or other commonly employed methods of rigging paper ballots like the newer OVC system would be able to.

(BTW the OVC is “not” a voting machine vendor, they are a consortium of computer scientists who’ve been designing better less expensive, more trustworthy, open source, voting & election systems since 2000 that can be built, maintained, or upgraded by any local computer company that counties are already doing business with, but they need the funding to build a prototype and obtain ITA certifcation so that any computer vendor in America can begin building & selling it.)

So, I am for pushing for an electronic vote count systems that produce a hand-recountable ballot that is easy for any election official to recount and pushing strongly for routine audits of the paper ballot paid for by the taxpayer, not the candidates.

Op Scan paper ballot systems that are half the price of most DREs are the best today. USCV’s mathematical analysis of election results (as soon as it begins coming out) will allow us to make a very convincing case that all elections’ paper ballots should be independently hand-audited every time.
I’m sorry that USCV’s National Election Data Archive Project (NEDA) has been so slow to be developed, but many good things are in the works. I have been recently educated on how to get a nonprofit corp off the ground and get it funded and we are just beginning to take the right steps, although we need some help from someone who would like to help us produce a bi-weekly newsletter and a brochure.

We are currently looking for a PR firm to do some pro-bono work for USCV and persons to join a PR/marketing team to develop our materials so that we can begin serious fund-raising to hire the half dozen persons needed to get the project done on time to analyze the 2004 election results and have a system in place for quick analysis of the 2006 election prior to candidates conceding.

USCV is looking for as many as 5 new board members or advisory council members to the USCV board who have talents in the areas of PR and marketing, project management of software projects, legal, compensation, and fundraising... USCV would like to hire two programmers, perhaps graduate students or programmers from India? for the summer to implement the first phase of our project, the public election document repository so that anyone may obtain the original election results data for any county for the 2004 election after volunteers have uploaded it.

This will permit anyone to verify the election data and examine it, and create their own database or spreadsheet from original election docs.

Our project is a little stalled right now for reasons you may have heard about, but will pick up speed in July again.
Best,

Kathy
____________________________________________________
Ellen's response:
I don't have time to edit the spreadsheet right now. But I'll tell you whatI know.
Also, the HYPGEOMDIST formula is pretty well documented in Excel help.

HYPGEOMDIST(A,B,C,D) gives the chances of finding A target elements,where B is the the number of elements in the sample you select, C is the number oftarget elements in the total population, and D is the number of elements in the total population.

To determine the chances of detecting at least one bad precinct, I used this formula
1-HYPGEOMDIST(0,B,C,D) where
B = the number of precincts audited
C = the number of bad precincts in the county
D = the total number of precincts in the county

Then, of course, to determine the chances of detecting at least two bad precincts:
1-HYPGEOMDIST(0,B,C,D) -HYPGEOMDIST(1,B,C,D) and so on.

Sadly, Excel craps out when the numbers get too big. For example,
1-HYPGEOMDIST(0,160,800,8000) yields "#NUM!" Sigh.

One thing I haven't done, but I think would be valuable, would be to determine statistical information about the remaining precincts based on the number of bad ones you find in an audit. For example, if you have 2000 precincts, you audit 200 of them and find 3 bad precincts, is there any way to predict how many remain in the unaudited precincts, and what level of certainty would you have? Or would this have to be a range? I'm not sure
how to approach it.

Ellen Theisen
www.VotersUnite.Org

AttachmentSize
Chance of Finding Corrupted Precincts.xls29.5 KB