This activity was inspired by the article “Innocent Until Proven Guilty”, by Catherine Case and Doug Whitaker. NCTM Mathematics Teacher, Volume 109, Issue 9 (May 2016)
Around February each year, the AP Statistics message boards come alive with new and veteran AP Statistics teachers seeking ideas to help students understand the concept of statistical power. While Power is a “minor league” topic in the AP Stats curriculum, a robust discussion of the concept can help tie together the logic of statistical inference: P-values, error and sampling variability. I’ve developed a few activities to try to bring Power to life (see here and here). And while each was satisfying in their own way, none of them really met one of my overarching classroom goals – to have students identify and express a new idea with their groups before I provide clarification. This year’s activity worked nicely as it allowed students to experience statistical power and generate meaningful conversation. Download the student version below, then read to learn how it works.
In this activity students will investigate the “fairness” of 3 virtual coins through a Desmos graph, using 3 different sample sizes to compile evidence. For each sample, students use their graphing calculator to compute a P-value and then reach a statistical conclusion. For coin A, I led students through the steps for n=10 and encouraged them to work through the next two sample sizes using their group-mates as a support system.

As students completed all three columns for coin A, I asked them to make a final decision regarding the fairness of coin A – is there convincing evidence that coin A is unfair? Students discussed findings with their groups and thoughts about how each column provided convincing evidence. Here is what the class-wide vote and conversation revealed:
- Of my 42 total students (2 classes), only 1 student concluded that coin A was unfair.
- All groups agreed that the larger sample size (n=100) was more useful in reaching a decision about the coin.
Spoiler alert: coin A is unfair! If you take a peek under the Desmos hood, you will find that coin A is “programmed” as 48% heads, 52% tails. I didn’t reveal the true proportion until the end, but we are off to a good start here: small differences between the null and “truth” are less likely to be detected.
Groups then tackled coin B with little assistance from me. Working through each column, then the follow-up conversation and decision, took about 5 minutes. This time about 60% of the students concluded that coin B was unfair.
Finally, coin C. Many students quickly concluded that coin C was unfair (it is!) but worked through each of the columns and sample sizes. In the end, there was class-wide agreement that coin C is an unfair coin.
At this point I revealed the truth about each coin:
- Coin A: 45% heads
- Coin B: 40% heads
- Coin C: 25% heads
So, what do our finding show us about hypothesis testing and decision-making as a whole? I was thrilled when one of my students who does not volunteer often raised his hand to offer the following: “If there is a big difference between the null and the truth, it’s easier to reject the null.”
Yes! That’s a big part of power. What else?
Larger sample sizes are more likely to detect a difference when one exists.
Yes! And now we have a nice framework for power. From here I shared a working definition of power and included thoughts on alpha, which are not part of this activity now but could be in a later version.
EmPower your students to develop statistical ideas!