Up: Issue 5 Previous: Paper 2

# Voting matters - Issue 5, Paper 3

## Producing plausible party election data

#### B A Wichmann

The STV database lacks any data from public elections which involves political parties[1]. This is hardly surprising due to the legal constraints on public election data. However, from the point of view of election studies, this omission is very unfortunate. Statistical studies of real election data are important, since we know that desirable logical properties cannot be universally satisfied.

For public elections, the only information available is that of the result sheet. Unfortunately, this information is very much less than that contained in the ballot data itself. Only a few preferences expressed by votes are actually exercised in the counting process and therefore can be reconstructed from the result sheet. It is possible to produce minimal ballot papers which will give the same effect as the result sheet, but such ballot data is very unlike the (unknown) ballot data itself. In contrast, we are here attempting to produce ballot data which appears similar to the actual data, so that our constructed data can be used instead of the real data.

In this study, we are using the Irish election data for the years 1969 and 1973, since this is available in a convenient book format which is easy to process, see Knight and Baxter-Moore[3]. The first election in the book, is that for Carlow-Kilkenny. For this, we have:

```             Information content
Result                 8 bits
Result Sheet         800 bits
Election Data    800,000 bits
```
It might therefore appear that we have a hopeless task since the result sheet contains a thousand times less information than that of the (missing) election data.

However, we established[2] that if we can provide a matrix giving the probabilities of A being followed by B (for all candidates A, B), then election data can be constructed which appears to have the statistical properties one would expect, at least as far as the election results are concerned with the usual STV algorithms. Hence if we can produce an estimate for the A-B probabilities, we can construct plausible data.

Taking the result sheets for all the Irish elections for 1969, we can study just the first transfers made. These transfers are not restricted in the potential choice that can be made by the elector, and therefore can provide a basis for the probabilities we wish to estimate. To compare one constituency with another, we label the candidates FF1, FF2,.. for Fianna Fail in order of the first preferences, and similar for Fine Gael (FG1, etc), Labour (LA1, etc) and others (OT1, etc). (Fortunately, this is exactly the order listed in [3]) We only need to consider the three main parties since they account for around 97% of the first preference votes. However, the 'other' candidates must be taken into account with transfers, and hence appear as a notional party.

Table 1 gives the first transfers for all[4] the 1969 Irish elections. The candidates are labelled as above and NT (for Non-Transferable). A blank in the relevant columns indicates no such candidate. Others are listed in the order given in Knight and Baxter-Moore[3].

Table 2 shows the transfer from Fianna Fail alone. The star against the Waterford entry represents a change from the original. In this case alone, the FF transfer was by elimination; but we wish to put under FF1 the candidate from which transfers were made, which implies permuting the columns as shown. Again, the star against the Carlow-Kilkenny entry represents a change from the original. Here, the candidate FF2 already had the quota, and therefore was not eligible for transfers (or rather any such transfer would have been ignored) and hence the transfer to FF3 is regarded as being for FF2, being the next available FF candidate.

Table 3

The columns can now be added up to see what the average transfers are. (The total transfers are 33,549, but we express this as votes transferred per thousand.) This result is shown in Table 3, where FF1 here represents the first Fianna Fail candidate to which transfers could be made. As expected, this indicates weak cross-party voting and that the most popular person within a party is that based on first-preference votes.

Table 4.

Table 5.

Table 6

Tables 4, 5 and 6 give the corresponding transfers of 1,000 votes from Fine Gael, Labour and the other parties respectively.

Hence we now have estimates for our A-B probabilities, although these figures are very crude for the following reasons:

1. The tables show large variations between constituencies.
2. Comparing constituencies with different numbers of candidates for each party is dubious.
3. Grouping all other candidates into a notional party is clearly dubious also.
Nevertheless, we now have some estimates that are probably as good as we can get in the circumstances.

The next process is to use the above estimates for providing default transfer probabilities in those cases in which the result sheet does not provide this information.

For each of the Irish elections for 1969, we compute the transfer probabilities that can be found from the result sheet. For the other values, we use our estimates. This then allows for plausible ballot data to be computed by program.

The computer program does need to reduce the ballot data to manageable proportions. For Carlow-Kilkenny in 1969, there were 46,073 ballot papers. If we constructed this number of ballot papers individually by program, we would have a 750K bytes data file - too big to process rapidly. We can reduce the data file to a more manageable size by having piles of identical papers, which all the computer algorithms can handle rapidly. The program uses piles of 500, 100, 50, 10, 5 and 1 paper(s), adjusted so that the correct number of total ballot papers is produced, and the first preference counts are the same as the result sheet. The data file is now reduced to about 11K bytes.

The program also attempts one further adjustment. The ballot papers match the first preferences and the total votes cast exactly, but the match to subsequent transfers is only similar in terms of the proportion of the occurrence of A-B's in the papers. To obtain a better, but not identical fit, the program computes many examples using different seeds for the random number generator, and selects the best example. Determining the fit between a ballot paper set and the result sheet is not straightforward. To undertake the comparison properly would require a computer version of the Irish STV rules which was not available. Instead, the ERS rules were used, which has a number of differences from the Irish version. The most obvious difference is rounding the votes to whole numbers (single ballot papers are transferred), rather than one hundredths; but this makes little difference in this case with over 10,000 votes cast in each election.

To summarise, the program takes as input:

1. The transfers between parties deduced from a set of elections.
2. The result sheet from a specific election from that set, giving the party affiliation of each candidate.
3. Seeds for the random number generator, and a number of trials from which to select the ballot set with the best fit.

From this, the program outputs a set of ballot papers giving a 'good' fit to the specified election. Note that by changing the seeds for the random number generator, slightly different sets of ballot papers will be produced.

This program was then used to construct plausible ballot sets for the 1969 and 1973 Irish elections. The elections in 1973 were regarded as distinct from 1969, so that the same process as illustrated above was used to construct another table of transfers per thousand votes between parties.

A summary of the results from analysing the election data appears below. The meaning of the entries in the table are as follows:

• Dn On my home computer, I have nine different STV-like algorithms. Listed here is the number of algorithms giving a different result from the actual Irish election. A result of D0 is not printed.
• Cn A Condorcet ranking is computed from the election data. From this, the lowest-ranked candidate is found who was elected. Cn is the number of un-elected candidates ranked at least as high as that candidate.
• Pn From the Condorcet ranking, a Condorcet paradox is evident. Pn indicates the number of candidates involved in the paradox. The plus sign indicates that the paradox involves both elected and un-elected candidates. (Note that a Condorcet paradox involving the 'top' candidate is undoubtedly a problem when electing a single candidate, but not necessarily in other cases.)
• IEM Of the nine STV algorithms that were used to analyse the data, two are of special interest: Meek and the ERS hand-counting rules. Of the three when the Irish result is compared, the odd-one-out is noted (by a single letter). (Note that in the single case of Dublin SW for 1969, all three algorithms gave a different result, so there was not an 'odd-one-out'.)

The method of construction implies that it would be unwise to assume that there was an actual Condorcet paradox for South West Cork, since this property is dependent in part upon the data which has been added by statistical means. However, it would be reasonable to suppose that the fraction of elections in Ireland having a Condorcet paradox is about one third, and about a quarter have a paradox involving elected and unelected candidates.

In many cases, the election result is clearly marginal between two candidates, and hence differences between the STV algorithms is not surprising.

Two elections stand out as being very different. For Dublin South West for 1969, all three main algorithms gave a different result. After the top candidate, the next six were in a Condorcet paradox. It seems clear that this seat is a potential example of non-monotonicity. I have been unable to determine if this is so, since I do not know of any computationally feasible way of determining the property. As an exercise for the readers, I have reproduced the result sheet, together with the fit my program produces, to allow others to determine if non-monotonicity occurs. I have been able to simplify the data by reducing the number of piles substantially, and also reduced the number of votes by a factor of ten, but this still does not provide an easy way of determining this vital property. David Hill has commented on this by noting that perhaps the property is not so important if it is impractical to determine its validity for a specific election.

The other unusual result is that for Longford-Westmeath for 1973. This is the only case in which there were two sets of candidates involved in Condorcet paradoxes in one election.

There is only a weak correlation between those elections having C /= 0 and those having D /= 0. There is some correlation between the C's and P's, which is hardly surprising due to the underlying dependence upon Condorcet. A Condorcet paradox involving both elected and unelected candidates is no guarantee that any of the STV algorithms will produce a different result as can be seen from Dublin North Central for 1973.

All the computer data produced in this study is available from me on request.

### Acknowledgement

This work would not have been possible without the excellent work of J Knight and N Baxter-Moore in tabulating and presenting the results of the 1969 and 1973 Irish elections.

### References and Notes

1. B A Wichmann. An STV Database. Voting matters, Issue 2, p9.
2. B A Wichmann. A simple model of voter behaviour. Voting matters, Issue 4, pp3-5.
3. J Knight and N Baxter-Moore. Republic of Ireland: The General Elections of 1969 and 1973. The Arthur McDougall Fund. London. 1973.
4. Donegal-Leitrim is excluded since this has the Speaker of the Dail elected unopposed, so comparisons are difficult.

### Appendix

The table below is the Irish result sheet as from Knight and Baxter-Moore, except that additionally the results computed by the program from the plausible data are shown in italics.

The actual event elected FF1, LA1, LA2 and FF2. The ERS rules with the plausible data elected FF1, LA1, LA2 and FG1, while the Meek algorithm with the plausible data elected FF1, LA1, LA2 and FG2.

There is a single Condorcet winner in LA1, but the set of candidates FF1, FF2, FG1, FG2, LA2 and OT1 are in a Condorcet paradox with the plausible data.

#### Dublin South West, 1969

Up: Issue 5 Previous: Paper 2