Class Opener – Day 62 – When Good Questions Go Bad

Today is our last day in the experimental design unit in AP Statistics, and students started class today with an actual problem (or at least part of one) from the 2013 AP exam. This is a question I read during the 2013 reading, meaning I saw about a thousand different responses to this question. It’s quite an experience to be able to share the good, bad and ugly of responses I saw.

For this question, which was number 5 of 6, I only gave out part (a) of the question as this is the portion of interest to us in our experimental design unit. Here’s the question, with some symbols changed for online convenience…

  • Psychologists interested in the relationship between meditation and health conducted a study with a random sample of 28 men who live in a large retirement community. Of the men in the sample, 11 reported that they participate in daily meditation and 17 reported that they do not participate in daily meditation.
    The researchers wanted to perform a hypothesis test [compare the proportion of men with high blood pressure among all the men in the retirement community who participate in daily meditation and against the proportion of men with high blood pressure among all the men in the retirement community who do not participate in daily meditation.]
    (a) If the study were to provide significant evidence against the null hypothesis in favor of the alternate , would it be reasonable for the psychologists to conclude that daily meditation causes a reduction in blood pressure for men in the retirement community? Explain why or why not.

In additonal to providing a response, I asked students to circle the most important words in the question – which words or phrases are most important when considering part a) of this question.

There’s was a hidden agenda behind having students circle some words and phrases.  The average score nationally on this question in 2013 was 0.57 points (out of 4). Most questions usually have an average around 1.2-1.6…with some creeping below 1 occasionally, and some venturing above 2.  In my memory, this question was BY FAR the lowest-scoring question in recent AP Stats history. And while part (a) was the best opportunity to score points, many students still missed its intent.

After students completed the question, I asked each group to provide me a “top 3” list, and we compiled responses on the board. Here are some words which made our list:


It’s not a bad list. And, in looking back, my instructions arern’t totally helpful, as there is one (and ONLY one) word which is important here – CAUSES!

In Statistics, there are big ideas, and then there are BIG IDEAS:

A well-designed experiment can allow us to infer cause-effect relationships. Observational studies cannot.

In this problem, students who tended to write more probably dug themselves deeper into a hole. It’s not easy to tell kids to “write less” as details often matter, but in this question saying “this was an observational study, and not an experiment” was all that was really needed.  In reading this question in Kansas City, I found many students who appealed to the small sample size or to perceived confounding variables, and many who simply seemed to gloss over the word “cause” as important. Note: you can find more details about student errors in the Chief Reader’s summary.

Sometimes the simple questions which assess big ideas become the toughest, especially when there are lots of scary-sounding words surrounding the concept. Having students identify the meaningful words can help facilitate these discussions.


