Building an Election Forecasting Model

I wanted to discuss in this post the process of creating an election forecasting model for the upcoming Australian election. I have covered some of the steps in the process in earlier posting but I will briefly go over these points here for the sake of completeness.

The first step is understanding the Australian election system, most importantly its preferential voting system. This system produces viable minor parties with the potential of holding the balance of power in individual district elections. Four minor parties presently have this potential. It is possible from election data to precisely measure the effect each of these viable minor parties have on individual elections. This work has been completed.

The second step is to understand demographic and other drivers which explain the share of votes received by the various parties. I have made  statistical models using district level demographic data from the 2011 Census for the two major party groups, the Greens and for the most important of the minor parties to accomplish this. These models use the standard ordinary least squares method. I used the Durbin spatial model to test for spatial autocorrelation across districts but found that the evidence for such autocorrelation is weak.

The third step is to account for changes in district composition since the last election. Two states, South Australia and Victoria, have had nearly all of their districts redistributed since the 2010 election. The changes were somewhat minor in the 11 South Australian districts but were quite significant in many of Victoria’s 37 districts. I have used local polling place results for the 2010 election and redistribution information from the Australian Electoral Commission to estimate the impact of these district changes on the various districts. This work is in progress.

The fourth step is to use polling data to estimate the status of the election at the national and state levels at any given moment in time. I use a Baysian approach to update and adjust existing polling information. The Newspoll group provides regular polling at the national level and periodic state polling. Newspoll’s final polling in the 2007 and 2010 elections was quite accurate and it seems reasonable to make use of this group’s polls in the analysis. The polling estimation is then inputted into the model produced in the second step above to make district level estimations.

These district level estimations are then the fifth step in the process and are estimations of the first preference percentages for the major party groups and for the Greens and Family First parties. Since the full candidate slates will not be available until about 30 days before the election it will be difficult to estimate percentages for the Christian Democrats and Liberal Democrats until we know whether and where they are running candidates. It is also possible that one or two new political parties will emerge in the process and we can estimate the impact of such new parties from the national and state level polling data as well as from media information regarding individual candidates and district races.

Finally, I use the information developed in step one above to estimate the preference distributions for the Greens and other significant minor parties. This produces the final results for each district. I anticipate making a first estimation of the number of seats won by each party shortly.