NFL Replays and the Chi-Squared Distribution

OK, I’ll admit the blog has been sports-heavy lately.  Now that the Super Bowl is over, hopefully I can diversify some.  But for now, one last football example…

This week, the sports blog Deadspin featured an article titled: “Does The Success Of An NFL Replay Challenge Depend On Which TV Network Is Broadcasting The Game?”   From the title, I was immediately hooked, since this exactly the type of question we ask in AP Stats when discussing chi-squared distributions.  (Web note: while this particular article is fairly vanilla, linking to this site at school is not recommended, as Deadspin often contains not-safe-for-school content.)

The article nicely summarizes the two resolution types used in NFL broadcasts, and the overturn/confirmation rates for replay  challenges in both groups.  For us stat folks, the only omission here is the disaggregated data.  I contacted the author a few days ago with a request for the data, and have yet to receive a response.  But playing around with Excel some, and assuming the “p-value” later quoted in the article, we can narrow in on the possibilities.  The graph below summarizes a data set which fit the conditions and conclusions set forth in the article.


By the time Chi-Squared distributions are covered in AP Stats, students have been exposed to all 4 of the broad conceptual themes in detail.  We can explore each of them in this article:

  • Exploring data:  What are the explanatory and response variables?  What graphical display is appropriate for summarizing the data?
  • Sampling and Experimentation:  How was this data collected?  What sampling techniques were used?  What conclusions will we be able to reach using complete data from just one year?
  • Anticipating Patterns:  Could the difference between the replay  overturn rates have plausibly occurred by chance?  Can we conduct a simulation for both types of  replay systems?
  • Statistical Inference:  What hypothesis test is appropriate here?  Are conditions for a chi-squared test met?

The author’s conclusions present an opportunity to have a  class discussion on communicating results clearly.  First, consider this statement about the  chi-squared test:

“A chi-square analysis of the results suggested those  differences had an 87 percent chance of being related to the video format, and a 13 percent chance of being random. Science prefers results that clear a 95 percent cutoff.”

Having students dissect those sentences, and working in groups to re-write them would be a worthwhile exercise.  Do these results allow us to conclude that  broadcast resolution is a factor in replay challenge success?  Has the author communicated the concept of p-value correctly?  What would we need to do differently in order  to “prove” a cause-effect relationship here?

One final thought.  While  I can’t be sure if my raw data is correct, the data seem to suggest that broadcasts in 720p (Fox and ESPN) have more challenges overall than 1080i (CBS, NBC).  And it seems to be quite a difference.  Can anyone provide plausible  reasons for this, as I am struggling with it.


Thoughts on Coin Flipping

It’s Super Bowl weekend, otherwise known here as the weekend I lose 5 bucks to my friend Mattbo. Matt and I have a standing wager every year on the Super Bowl coin flip, and I seem to have an uncanny, almost scary, ability to lose money on the flip. I also lose money to Matt on Thanksgiving annually when my public school alma mater is routinely thrashed by Matt’s catholic school, but that’s a story for another time.

Coin flipping seems vanilla enough. It’s 50-50 probabilities make it seemingly uninteresting to study. But beneath the surface are lots of puzzling nuggets worth sharing with your students.

The NFC has won the opening coin toss in the Super Bowl for 14 consecutive years.  Go back and read that again slowly for maximum wow factor.  This is the sort of fascinating result which seems borderline impossible to many and brings on rumors of fixes and trends, but just how impressed should we be by this historical result?  Try simulating 45 coin flips (to represent the 45 Super Bowls) using a trusty graphing calculator.  What “runs” do we see?  Does having 14 in-a-row seem so implausible after simulation?  A number of sites (mosty gambling sites) have examined this piece of Super Bowl history, where some attach a probability of  1/16384 (2^14), or .00006 to this event.  But what exactly is this the probability OF?  In this case, it is the probability, starting with a given toss, that the NFC will win the next 14 in a row.  But it is also the probability that the AFC will win the next 14.  Or that the next 14 will be heads.  Or tails.  The blog The Book of Odds provides more information about the coin toss, specifically how it relates to winning the big game:

The odds a team that wins the coin toss will win the Super Bowl are 1 in 2.15(47%).

The odds a team that loses the coin toss will win the Super Bowl are 1 in 1.87 (53%).

The odds a team that calls the coin toss will win the Super Bowl are 1 in 1.79 (56%).

UPDATE:  In 2007, NPR ran a short piece during “All Things Considered” about the coin-flipping run, which was then in year 10.  Finally found it here.   It’s a quick 4-minutes and great to share with classes.

UPDATE #2:  The AFC just broke its dry spell.  Thanks to NBC for the nice stat line:


Exploring runs in coin tossing through simulation allows us to make sense of unusual phenomena.  On the TI-84, the randint feature allows for quick simulations (for example, the command RandInt (1,2,100) will produce a “random” string of 100 1’s and 2’s).  Deborah Nolan, a professor and author from UC Berkley, has developed an activity which challenges students to act randomly.  A class is split in half and given a blackboard for recording coin flipping results, and the professor leaves the room.  One group is charged with flipping a coin 100 times, and recording their results accurately.  The second group is given the task of fabricating a list of 100 coin flip results.  After both are finished, the professor returns and is able to quickly identify the falsifies list.  Too few runs give the fabricators away.

Does the manner in which a coin is tossed make the outcome more or less predictable?  Engineers at Harvard built a mechanical flipper to examine the relationship between a coin’s initial and final positions.  The assertion that much of the randomness in coin flipping  is the result of “sloppy humans” is tasty; we humans have trouble being random when needed.  Along the same lines of innovations in coin tossing, the 2009 and 2010 Liberty Bowl football games used something called the eCoin Toss to make the toss more accessible to the crowd.


Finally, if you are into old-school, bare-knuckles, coin flipping, you can mention these scientists, who each took coin flipping to the extreme:

Comte de Bufton:  (4,040 tosses, 2,048 heads)

Karl Pearson (24,000 tosses, 12,012 heads)

John Kerrich (10,000 tosses, 5,076 heads)