Wednesday, January 09, 2008

News analysis of wrong New Hampshire polls terribly flawed


The cartoon is a play on words: there is a Pew poll, just like the Gallup poll, Harris Poll, and so forth. All the major news networks are reporting that the pollsters were wrong about the New Hampshire primary. They are wrong. First of all, polls taken on the day of the election were accurate. Poll taken on day of the primary called it accurately. Here's another poll that was accurate.

Secondly, polls are based on a sample of individuals, choses randomly, and the intent is to estimate (not predict, but to estimate) the opinions of the entire polling population. We call this estimating the parameters of the population, that is, the average opinion based on all combinations of samples of the same number of people selected, in the case of the first poll above, 353 people. In other words, sampling randomly 353 people, in as many random combinations of 353 that can exist, what would the average support for Hillary be? The estimate is in the form of a range of support, based on the error involved in sampling only 353 people instead of interviewing every registered voter. In the first survey above, the range of support for Hillary was 32 - 38% of the voters interviewed. The mid point of that range (the middle number ) is 35%, which the press reported. There is also a probability of 5% that the entire poll is wrong. This is called the alpha or the probability estimate - this is how reliable the poll data are, or its significance level. A significance level of 95% (a 5% chance of being wrong). Another way of saying this is there is a 95% reliability of the poll, and only a 5% chance that one the results are a result of randomness, or just chance). One can calculate the interval, and the overall level of significance, by a mathematical formula (I can't type it here), but the formula involves the standard deviation of the sample data (which is an estimate of the population standard deviation), and the "n" or the number of people interviewed in the sample.

In close elections, the intervals for two candidate may actually overlap. in that case, the election is a statistical deadheat. In the 2000 and 2004 presidential elections, the monday before the election, George Gallup Jr. said the race was too close to call - because the intervals of Kerry and Bush overlapped. Given the error within the sample, the interval of estimation, the race was too close to call.

So, based on election day poll of N=353 people, 35% (or more accurately, 32-38%) of the people will vote for Hillary. The actual vote for Hillary was 39%. Pretty darn close. Certainly no reason to say that the polling was "flawed."

Polls also depend upon time: if the public opinion's are changing, polls have to be timely. Even a poll taken the morning of the election is good only for that point in time: public opinion is a movie, a poll is a picture taken at one time period. Polls taken the friday before the tuesday election are accurate only on friday. they don't predict the future - polling is not a time machine, and a friday poll is not considered mathematically or statistically accurate for a future prediction.

The stories are about how the polls were wrong. One story is how Obama's candidacy may lead people to say "yes I am going to vote for him" because they don't want to admit to the pollster that they wouldn't vote for Obama because, perhaps, it may make them appear to be racist or prejudiced. Maybe, but there is no hard evidence of that. It would make an interesting research study - maybe a simulated poll and election in a controlled experiment.

the news reporters are looking for a story, and how Hillary showed the pollsters they were wrong is a good story, or how people who aren't going to vote for Obama say they are to avoid looking racist. None of that is apparent by the polls I have read.

In 2000, in the florida presidential race, many people said the exit polls were wrong. No, I think they were right: voters left the polls thinking they voted for Gore, only because of the butterfly ballot, they actually voted for the libertarian candidate: Gore was listed second on the ballot on the left hand side, but the voter had to pick the second hole to punch. Confusing - the demo's should have stopped that ballot before the election - they have that right, through the courts. That's why ballots are always published in advance of an election.

Polling can be very accurate, if done scientifically. the news reporters don't seem to understand polling principles or techniques.

No comments: