Sarah has received some grief over her post on polls. It comes from some folks who think they know more than she does, and are, therefore, eager to berate her for her “stupidity.”
Unfortunately, these folks all want her to learn something about statistics. Yet they obviously don’t understand the first rule of statistics which is they are not fact. Sarah also provides a link to a Den Beste article which explains things quite well. I doubt any of them read it. It might be contrary to their opinion, which they take as fact.
When I got my masters degree, I had to take a class called Research Methods. It was absolutely the most worthless class I have ever taken. It revolved around writing survey questionaires, and determining a method of sampling a population. I am an engineer, and statistics is a mathematical tool used to make some kind of sense about data collected. When we do a test, we take as much data as possible. Usually, this is sampling various sensors and transducers at a rate of anywhere from two to one hundred times a second. That rate is determined by the type of data. For ambient temperature, which might change a tenth of a degree every ten minutes, twice a second is ample. For more dynamic variables, we will sample at the highest feasible rate the equipment can handle.
What that means is that we may have hundreds of thousands of numbers, and these must be analyzed using some kind of statistical algorithm. When a piece of data is an outlier, or doesn’t fit nicely with the rest, we make a determination of whether that value is an error, or a realistic measure of what actually happened.
Polls use a different form of data collection and analysis. Pollsters TRY (if they are good, and really want a representative sample) to collect data from a wide variety of the population they are trying to measure. It is impossible to get a response from every person in the target population, so they use a variety of techniques to get a random sample. This is the primary failure in polling.
Most people are unwilling to submit the time to respond. And those who do may not even answer honestly. There is no way for the pollster to know. This separates scientific data analysis from polling. Pollsters make no effort to determine the outliers. Realistically, they can’t. They simply don’t have enough data, and what they have may be completely invalid. Add in to this the inability to write a valid question and the time necessary to ask different enough questions to validate other questions in a way the respondant can’t “game” the system, and you have an unwieldy poll. There is no way to tell if all the data are wrong.
Knowing this, pollsters use a factor to account for as much error as they believe they can. This is the percentage you hear about described as a Margin of Error. It, too, is a statistical quantity based on the number of people willing to respond to the questions, not the number contacted.
A well-written poll which samples a precise population can be pretty accurate. Unfortunately, most polls are done quickly to get a sense of how people “feel” about a given topic. This quick-fix precludes going in depth, and the tone of the verbal question, as well as the voice of the person asking, can cause two people who agree on something to give completely opposite answers. Polls written by someone with an agenda can bring out results in the direction they want without much effort. Question sequence, which may be used to lead a respondant down the path of thought a pollster desires, or options which don’t provide for all possibilities (Have you quit beating your wife?), can skew the percentages in any desired direction. Realistically, any poll not looking for a simple Yes-No response on a simple issue (Who will you vote for?) has major problems in accuracy unless it requires dozens of answers for analysis.
“Should we be in Iraq?” sounds like a simple question, but must be understood in its context within the poll. When in the list of questions was it asked? What questions were asked before? What answer did the pollster anticipate when placing the question in that sequence?
When you can answer those questions, you might have a vague idea of what the raw numbers in a poll really mean. You won’t get that information on most polls, and need to ask why before taking poll numbers as valid.
*****UPDATE*****
Sarah shut down comments for the post after taking abuse from no-nothings who claim to understand statistics. I was going to add this, but now will simply post it here.
When using statistical analysis of data, it is absolutely fundamental that interpolation and extrapolation are two very different sides to the coin.
Interpolation is the analysis which draws conclusions from data based on the point of interest falling within the bounds of the collected data. There are errors inherent in this within the statistical standard deviation. All in all, this can be used with some degree of certainty based on the amount of data collected.
Extrapolation, on the other hand, makes some assumptions which may be invalid because the point of interest is outside the bounds of data collected. It assumes the data behave nicely, and so a trend can be used. The amount of data collected must be considered, but a solid knowledge of the expected outcome must also play a role if you are to extrapolate knowns to unknowns. Extrapolation can be used effectively when Newton’s Laws are involved, but is not as consistent when discussing social questions.
Polling is extrapolation. The conclusions cannot, with any reasonable certainty, be used to draw further conclusions on variables outside the sampled population without having a solid understanding of that population. It is inherently inaccurate. If the pollster understands the entire population the sample is expected to represent, he may be able to draw some conclusions which are accurate within a degree he can live with.
For those who doubt it, check how much difference there is in polls reputing to ask the same population identical questions.
Well said.
Comment by Sarah — June 13, 2004 @ 2:51 pm
I made it half way through the comments before I wanted to puke. Gotta love those “compassionate” liberals. They sure know how to use the four letter words and follow the leader to toss some insults. Truly sickening.
I’m so sorry this happened to her. Her blog is the first I read every morning, then yours and then on to CPT Patti. It just really stinks that they’ve decided to spend their Sunday doing everything they can to hurt someone’s feelings. Just confirms my belief that the left is the party of hate – and it sure isn’t hard to find examples of this everywhere these days, now even on one of my favorite blogs. Their rhetoric truly sickened me. I’m really sorry this has happened.
I hope she’ll keep on tellng the truth as she always has.
Comment by Shannon — June 13, 2004 @ 10:29 pm
My claim to fame in college statistics at Trinity U. is that 1] somehow I eked out a “B” in a hard class and 2] I sat next to and studied with Alice Walton..Sam’s daugther.
Comment by Wallace-Midland, Texas — June 13, 2004 @ 10:43 pm
We called it “Sadistics” when I was in college.
Comment by Bunker — June 14, 2004 @ 5:43 am
I think that most of the comments missed the whole point of her original post. IMO, she wasn’t so much discussing the statistics of the poll, as how it was reported in the paper. Her observations revolved around what WASN’T said in the report, indicating that the report was trying to create a sense of what the poll reported, that might not have been there.
But everyone has been flaming about the statistics, not the use of the English language.
Comment by NightHawk — June 14, 2004 @ 7:30 am
And the issue has nothing to do with statistics. Polling is not statistics.
But tell that to a college kid who knows all there is to know.
Comment by Bunker — June 14, 2004 @ 8:10 am
Shannon — you’re more likely to see this comment over here than in the mess on my site. Thanks for the support…and for clearing up the “registering to vote” nonsense. I had so much on my plate that I didn’t even bother to address that one.
Comment by Sarah — June 14, 2004 @ 10:23 am