Primary election season is here, and news reports are filled with sound bites from candidates, their supporters, and pundits all trying to get the edge by being the first with breaking news. It’s also polling season, as every news organization seems to have their own poll, all designed to project the winners. This provides a great opportunity to talk about some statistics concepts which often get buried in the high school curriculum: sampling, surveys, margin of error and confidence intervals.
One nice resource I have used in my classes before is the site pollingreport.com. This site collects polls from many sources: news agencies, university organizations and polling companies. Students can search from a long menu of topics and examine the careful wording of survey questions, time-progression data and information on sample size and margin of error.
Having students select their own survey, and interpret the results, can lead to interesting class discussions. One problem with polls is that the results are often taken as absolute, rather than an estimate of a population. An interval plot can help remedy this, and get students thinking about that pesky margin of error, which is often buried, italicized, or shown in a smaller font than the rest of a poll’s results. Here’s an example of an interval plot, using the results of a poll from pollingreport.com:
Quinnipiac University Poll. Feb. 14-20, 2012. N=1,124 Republican and Republican-leaning registered voters nationwide. Margin of error ± 2.9.
“If the Republican primary for president were being held today, and the candidates were Newt Gingrich, Mitt Romney, Rick Santorum, and Ron Paul, for whom would you vote?”
Some questions for discussion can then include:
- How can these results be used?
- What do you think would happen if we asked more people? Or if the election were held today?
- What would it mean if intervals over-lapped each other?
- How likely is it that nation-wide support for Rick Santorum is within the interval?
While confidence intervals don’t need to be defined formally, the concept of these intervals indicating plausible values for the population parameter can be discussed. The New York Times, in particular, does an excellent job of providing an accessible explanation for margin of error, such as this excerpt from a telephone poll summary:
In theory, in 19 cases out of 20, overall results based on such samples will differ by no more than three percentage points in either direction from what would have been obtained by seeking to interview all American adults. For smaller subgroups, the margin of sampling error is larger. Shifts in results between polls over time also have a larger sampling error.
Next, we can take a look at formulas for margin of error. One convenient formula found in some textbooks links margin of error directly to the sample size:
By going by to pollingreport.com, and pulling a sample of polls with different sample sizes, we can examine the accuracy of this short and snappy formula. The scatterplot below uses sample size as the independent variable, and reported margin of error as the dependent variable.
The formula seems to be a nice guide, and some polls clearly use more sophisticated formulas which generate more conservative margins of error.
Classes who wish to explore polling further can check out the New York Time polling blog, FiveThirtyEight, which provides more detailed analyses of polls and their historical accuracy.