Up: Issue 2 Next: Paper 5 Previous: Paper 3

# Voting matters - Issue 2, September 1994

## An STV Database

#### B A Wichmann

Since we know that no single algorithm for STV can have all the properties one might like, it appears that some statistical analysis may be needed to select an optimal algorithm. People do not vote at random and therefore any effective analysis must take into account voting patterns. For instance, if voters always voted strictly along party lines, proportional representation among such parties would be an important factor.

Collections of ballot papers from real elections would be useful for any practical analysis. There is a de facto standard for the representation of ballot papers in a computer, being the form used by the Meek algorithm. Hence collection of such data is practical and useful. Both David Hill, Nicholas Tideman and myself had such collections, accumulated informally over several years. I have now put this collection into a consistent framework so that the material can be provided to anybody who would like it - merely post a floppy disc to me, and I can return the disc with this data.

The data available has been classified in a number of ways as follows:

• Real: Data here is that from real elections, with the possible exception that a statistical sample of the total ballot papers would be acceptable. The reason for this is that it presents a means of providing 'real' data without providing the total information. There are potential dangers in analysis of real data, since an alternative algorithm could elect a different person, giving rise to concerns about the election itself, rather than the principles involved. Another reason for accepting a subset of all the votes is that this is all that may be feasible for a large election. Obviously, this data is provided in a form which precludes the identification of the election involved. There are currently 46 data sets in this class.
• Mock: This is data from genuine elections, except that no position or office is at stake. Mock elections are often used to educate people into the principle of STV. There are currently 2 sets in this class.
• Semi: Elections in this class are not genuine elections, but are clearly related to real elections. Examples in this class are 'ballot' papers derived from published STV elections (from Northern Ireland), elections from the Eurovision Song Contest and elections in which there was no fixed number of 'seats'. There are currently 21 data sets in this class.
• Test: Data in this class are not derived from any election but have been constructed to demonstrate the difference between some algorithms, show a bug in a computer algorithm, or some similar purpose. There are currently 129 in this class.
I would very much welcome additional data, especially from real elections in which some 'party' aspect is involved. The data can be provided in a form in which the origin cannot be traced. I have analysed an Irish election to produce a single data set in the Semi class, but this is very time consuming and has to make a number of assumptions to produce anything like the actual ballot papers. Hence real data is much superior.
Up: Issue 2 Next: Paper 5 Previous: Paper 3