Saturday, November 16, 2019

Polls apart

According to the polls Labour is in for a drubbing come Election Day. Latest polls give the Tories a commanding 16 point lead which could translate to a massive majority for Johnson and his deceitful compatriots.



Déjà vu? On the 15th April 2017 (five weeks before the General Election) the Sunday Mirror reported a ComRes poll that showed the Tories with a massive 21% lead over Labour. Labour were predicted to lose anywhere from 40-100 seats, even losing their hold in strongholds such as Wales. It seemed a disaster was looming.

Based partly on their insurmountable popularity as evidenced by the polls Theresa May, who you may recall was Prime Minister at the time, asked for an early General Election. She had good reason to be optimistic. The massed ranks of the British print and broadcast media had spent 12 months laying into Jeremy Corbyn.

It’s worth recalling how the media have treated Jeremy, a principled politician who was first elected in 1983, and has been a thorn in the establishment’s side ever since. Initially he was the plucky outsider who needed the signatures of MPs who were never going to vote for him to even be on the leadership ballot. Once he won he was just too nice to be a leader, despite having the support of over 60% of members (twice). Then he was a joke, a man who had an allotment and made his own jam. Then he was a terrorist sympathiser, Czech spy, dangerous Marxist, anti-Semite and just plain too boring to be PM. Unsurprisingly his personal poll ratings plummeted.

Despite this, in June  2017 the Labour Party secured more votes and a greater vote share than at any time since Blair’s 1997 landslide. Labour gained 32 seats and 40% of the vote. The Tories lost 13 seats with 42.5% of the vote. Their majority was slashed meaning that they had to do a deal with the DUP in order to remain in Government. The rest, as they say, is history.

In the 2-3 days before the 2017 election most pollsters, with the exception of Survation, had the Tories on 44-46%, with Labour trailing on 33-35%. So, how did the pollsters get it so wrong? 

The British Polling Council, to which all polling companies subscribe, noted that whilst the polls are good at predicting the Tory vote, they are less good when it comes to predicting Labour. However, this being the case in 2017, it was the exact opposite of 2015 when the BPC’s internal enquiry concluded:

“..the primary cause of the polling miss in 2015 was unrepresentative samples. The methods the pollsters used to collect samples of voters systematically over-represented Labour supporters and under-represented Conservative supporters. The statistical adjustment procedures applied to the raw data did not mitigate this basic problem to any notable degree.”

What happened in 2017 was the under-reporting of Tories in 2015 was corrected, mainly using statistical methods rather than more rigorous sampling techniques, but this produced an under-reporting of Labour support, and a slight over-counting of Tory votes.

To understand the inherent bias of opinion polls it is important to realise that polling companies would love to be able to say that they predicted the result of a General Election 6 or 7 days prior to the actual count. They are professionals and although the heads of the organisations may well be Tories many of the researchers on the ground and doing the statistical analysis will not be. This is simply to say that they are not reporting Tory leads just because that is what they would like to be true. They believe they are telling the truth.

Most of the companies now conduct the majority of their research online. One or two still conduct telephone interviews but few now employ an army of interviewers knocking on doors to talk to people. To do online research is a considerable financial saving, but it comes at a cost to quality.

To conduct online research it is necessary to compile a panel of people prepared to take part. Most companies now invite you to join their panels on their websites and some even offer payment. But, what this means is that the panels are, by definition, self-selecting. You may think that provided the panel is sufficiently large that is not a problem. That certainly seems to be the attitude of the polling companies.

Generally speaking research on the general population (for elections that means adults aged 18 and over) is regarded as fairly robust with a sample above 1,000. In 2017 sample sizes ranged from 1,000 to 11,000. However, sampling theory tells us that for that 1,000 people to be representative of the study population (in this case voters in the UK) it should be randomly drawn from that population.

Companies are rather secretive about their panels, but YouGov has stated that it has around 1 million people signed up to its panel in the UK. That may sound a lot of people but it is important to remember that the UK electorate consists of 45,775,800 people (as of December 2018, the latest figures). So, whilst 1 million is a lot of people it is only 1/46th of the people who could be included.

More importantly it means that over 44 million electors in the UK will never be asked their opinion by YouGov. This may not be important. If the one million panel members had been randomly selected then every elector had an equal chance of being selected, and within certain known parameters that sample of the voting public could reasonably be claimed to be representative.

Full disclosure. I am on the YouGov panel which means I know how you get to be part of that million electors whose views are reported as if representative of all voters. I was not contacted by YouGov to join their panel after my name was drawn randomly from a metaphorical hat, but rather I clicked a link on their website inviting me to join their panel. There was nothing random about it at all. I just fancied taking part.

To understand where the random element in online sampling comes from we only have to realise that the majority of polls for YouGov use about 2,000 respondents. That means every member of their panel has a 1 in 500 chance of being selected for each survey. Were they randomly sampling from 45 million voters the chances would be around 1 in 10,000 chances of being selected. That is quite a difference.

A question that is rarely asked of opinion polls is: what is the probability of any particular voter being included in the sample used? To put that more simply, how likely are you to be asked? The answer is that unless you are on their panel, zero. And, that means that the results may be representative of their panel, they are not however representative of the UK electorate.

This of itself, however, does not invalidate their polls. The question becomes not whether 1 million people is enough, but how close are those people to the rest of the population. I have long suspected that people who sign up to be on panels are anything but representative. My suspicion is that they are likely to be older, better educated, and more middle class than average. Although that does not fully explain why they are more likely to be Tory it does explain why the Labour vote which is strongest among younger people and working class people is likely to be seriously under-represented.

It is this under-representation that leads to statistical manipulation of the raw data to produce results which the researchers think is more accurate based on previous factors. Here’s an Opinium poll from October. 

“Conservatives maintain their poll lead at 16% as they hit 42% in the polls.” 

Note the use of the word “polls”, rather than the singular poll which Opinium had carried out.  The headline gives no margin of error. Margin of error is a statistical technique used on random samples to determine the likely range of the true figure. Typically for random samples of over 1,000 the margin of error would be + or – 3%. In other words, the 42% should be reported as somewhere between 39-45%. It should also be made clear that this was a survey of Opinium panel members and since that excludes the majority of voters it is far from representative. At best it is a rough guide.

But, that is not the only problem. If you look at the data on which this result is based you will find that 512 people said they would vote Tory. That is 32% not 42% of the sample. The figure has been massaged to produce the larger figure because the sample was clearly not representative. To be fair, Labour’s figure was massaged upwards too by 4% to give 26%. The Tories still had a 10 point lead but not the 16 points being reported.

Polling companies are notoriously unhappy about revealing exactly how 32% becomes 42% hiding behind the concept of weighting. In 1962, a social scientist called Thomas S. Kuhn wrote a brilliant book called The Structure of Scientific Revolutions. He wrote about how scientists work within what he termed ‘paradigms’. It debunks the idea that science consists of brilliant minds having “eureka” moments and shows that the conduct of science is rather more mundane and incremental. 

The point is that opinion pollsters, like scientists, work with a set of taken-for-granted assumptions that dictate the presentation of their work. Imagine if you were a researcher for a large polling company working on a general election. Your data shows a large lead for one party. What would you do to be sure that this lead existed?

First you would compare it with previous polls of your company – is it consistent with them? Second, you would look at recently published polls by rival companies – is it consistent with them? Third, you would use other data particularly the previous general election to see where the relative positions of the parties should be. If your data is consistent with all those checks you would feel happy presenting it. But, what if your data is the only data giving this lead, how happy would you, and possibly more importantly your bosses, be to present a “rogue” poll?

The polling paradigm is equivalent to a herd mentality. If we are all saying, more or less the same thing, based on very similar methods, then we must be right. If the data is consistent, then it must be right, if it isn’t then more weighting is required to bring it back within the paradigm.In this way 32% can be confidently reported as 42%.

I think what this shows, and you could do the same calculation on almost any published opinion poll is that the polling companies are not simply reflecting public opinion they are actually creating it. Their self-selected samples have a built in bias toward certain sections of the population who are clearly over-represented, hence the statistical manipulation. But, perhaps more importantly, the samples used are representative only of those who sign up to be panel members rather than the voting public per se.

There is another important issue to be considered in the reporting of general election polling. A general election is not a single event, it is 650 discrete events. Each constituency is affected by a range of national and local circumstances. Opinion polling tends to reflect only what people broadly think about the national issues. It does not tell us what voters in Liverpool Walton (Labour’s safest seat with 80% of the vote) think. But I can confidently tell you that if there are 42% of Tories in Liverpool Walton they are very well hidden and they do not vote at elections!

Any attempt to extrapolate a national opinion poll to the number of seats each party will get is pure nonsense. Even with a sample of 2,000 that would mean each constituency having a sample of 3 people. Surveys of less than 1,000 in individual constituencies are worthless even if random sampling takes place. This is worth pointing out because the Lib Dem’s have been reporting constituency level surveys of 400 or so voters recently which, surprise surprise, show them in the lead.

Given all this you might wonder why anybody takes any notice of opinion polls at all. Unfortunately, the results of opinion polls are manipulated enough that they are never too far from reality. They have a kind of plausibility and particularly so given the inherent anti-Labour bias in most sections of the media. That has less to do with their accuracy than opinion polling companies using previous elections as their guides. The press tend to report them as fact, particularly when, as they often do, they accord with their own editorial position.

So, should we just ignore polls altogether? I would say yes but unfortunately that would not get rid of them. My own suggestion is to challenge polling companies and the media to explain how they have sampled and to give a proper breakdown of what weighting techniques they use. 

When I see polls I tend to use a calculation of my own to arrive at what I consider to be a fairer representation of the data. My own guess, though I stress that this is only a guess, is that on average Labour support is under-counted in polls by about 4% and Tory support is over-counted by around 6%. That is a result of the sampling bias I have discussed.

So, if they are reporting 42-26, the actual figures are likely to be closer to 36-30. And, to this I would add a margin of error of 3%, meaning that the real figure could be 33-33. If that’s right, there is still everything to play for as it is too tight to call. As ComRes put it in their recent poll report:

“…beware of pundits trying to forecast the unforecastable. Only someone who’s been on the eggnog would bet the mortgage on the outcome of this one.”

1 comment:

Many thanks for reading this post and for commenting.