Category Archives: Statistics

Class Opener – Day 37 – Random Sampling

We’re into the home stretch of our stats unit and I am looking to reflect upon our study of the normal distribution, yet look ahead at what’s to come – sampling and margin of error.  The Rossman-Chance Applets provide some meaningful, interactive discussion starters in statistics.  Today, as students entered, the Reeses Pieces sampling applet was drawing random samples of size 25 – displaying the results in a dotplot.  Little by little, the dotplot took shape as more samples were drawn…until, eventually, an old friend made an appearance….

reeses

Hey, that’s the Normal Distribution!  Why yes, yes it is…..isn’t it great that random samping reveals such a powerful statistical concept?

But that’s not all.  With election day approaching next week, we can start to build connections between random sampling, the normal distribution, and political polling.  The parallels are strong, and we’ll talk about them within the next week.

  • With sampling candy, there is a pre-existing proportion of “orange” candy.  In a voting population, there is an existing proportion of people who will vote for a certain candidate.
  • With candy, we can draw a random sample of candies. For a poll, we contact a random sample of potential voters.
  • When we sample candy, sometimes we might get 50% orange, or 45 % orange, or 60% orange – variability is part of the game. In political polling, we try to estimate the poroportion who will vote for a candidate, and we hope to get close to the target. Margin of error gives us an idea of how close we are.

There are many other interesting discussions to be had surrounding the applets – try some with your classes. They aren’t just for AP kids!

Advertisements

Class Opener – Day 36 – Introverts vs Extroverts

We’re about half-way through a basic stats unit in my 9th grade class, with a quiz tomorrow on standard deviation and the normal distribution.  I need one last class example to have students compare and contrast data sets by looking at their centers and variability.  A morning brainstorm turned into a fun exploration of my students’ personalities.  3 groupings were shown my back whiteboard:

  • EXTROVERTS
  • MIDDLE
  • INTROVERTS

After a brief discussion of what it means to be introverted or extroverted, and doing my best to steer discussion away from any negative connotations, I asked students to self-identify and move to a corner of the room based on where they see themselves.  To clean up things some, I told them to arrange themselves so that we had exactly 8 introverts and 8 extroverts, with everyone else in the middle.  Some adjusting then took place, as we agreed on who belonged in which group.

Now for the data collection aspect.  I had each student approach the back board and write their signature in the appropriate column.  This is where the fun began – as my introverts calmly waited for their peers to write their names and move away, the extroverts fought over markers and board space.  As students sat down after contrubiting their signature, some noticed immediately what was happening :

classdata

After all names were written, and we had a good laugh over the clear differences in the categories, we needed some data.  Each student approached the board and measured the height of a name at its tallest point, recording to the nearest tenth of a centimeter.  Tonight’s homework is then to compute the standard devation “by hand” for one of the groups, and comment on differences.  My old friend the Nspire App is helpful here to show the clear difference between the introverts and the extroverts:

graphsUsing authentic data in class matters, as kids more readily discuss what they see and are generally more eager to dig deeper into a problem.  This was a fun way to culminate the first half of our stats unit.

Class Opener – Day 35 – Tall and Short

tallshortWe’re thinking about standard deviation in my 9th grade class, and the idea of variation and “unusual” data points. I think the picture which greeted students today says just about all which needs to be said on standard deviation, doesn’t it?

Later in class, I asked students to plot their heights on a number line I had drawn, with a low of 60 inches and a high of 74.  From here, I asked students to estimate what our class standard deviation might be.  Some interesting responses were generated:

  • 10 – probably because 60 and 70 appeard on the line.
  • 5 – because that would seem to cover the number line

When I reminded them that standard deviation can be thought of as “typical distance from the mean”, the responses evolved and eventually we settled on between 2 and 3, where travelling 2 standard deviations in each direction would cover everyone in the class.  Next, when I told them that the World’s Tallest Man had a height over 8 standard deviations from the mean, meaningful gasps were shared, and we could move on to notes onvolving the normal distribution.

Short post today as I am about to start 23 parent conferences over 6 hours….wish me luck!

Class Opener – Day 33 – Mind Control with 9th Graders

argentina-road-mapA map of Argentina….a class of impressionable 9th graders…and a devious teacher….what could go wrong? I’m currently reviewing past statistical ideas with my 9th graders, with an eye towards standard deviation, the normal curve and sampling.  To generate some data, I asked each of my classes 2 questions about the country of Argentina:

MORNING CLASS:

  1. Do you believe the population of Argentina is MORE or LESS than 10 million?
  2. Estimate the population of Argentina.

AFTERNOON CLASS:

  1. Do you believe the population of Argentina is MORE or LESS than 50 million?
  2. Estimate the population of Argentina.

Both classes gave me strange looks.  But with instructions to answer as best they could, the students played along and provided data.  Did you note the subtle differences between the two question sets?  The two classes provide striking different estimates, due to the anchoring from the first question.

argentina

The inspiration for this activity comes from the book A Mathematician Reads the Newspaper by John Allen Paulos, which contains many other quick nuggets to use in your classroom.  And now we have a rich conversation regarding the wording of poll questions to enjoy in the next few days!

Class Opener – Day 30 – Stats Entrance Tickets

paranormalUsually my openers here on the blog are those I share with my freshman classes, but today’s post features my AP Statistics class.  They are preparing for their test on normal distributions, and it’s no time to be spooked!  Today’s class started with the famous stats cartoon shown here, and an entrance ticket – one part of a past AP problem dealing with normal distributions.

Schools in a certain state receive funding based on the number of students who attend the school. To determine the number of students who attend a school, one school day is selected at random and the number of students in attendance that day is counted and used for funding purposes. The daily number of absences at High School A in the state is approximately normally distributed with mean of 120 students and standard deviation of 10.5 students.
(a) If more than 140 students are absent on the day the attendance count is taken for funding purposes, the school will lose some of its state funding in the subsequent year. Approximately what is the probability that High School A will lose some state funding?

The full exam (and all free-response questions) are available on the AP Statistics area on the College Board website, who own the copyright on all AP problems.

Despite the length of text in this problem, part a here is a simple normal distribution probability, one which any AP student should be able to tackle easily.

I gave students 4 minutes to provide a solution on the printed sheet, but did not ask them to identify themselves on the paper. After collecting the sheets, I mixed them up and prepared to share them under my document camera.  This particular problem is one I graded last summer at the AP Stats reading in Kansas City, and if you know what you are looking for, it is a quick grade (by my super-unofficial count, I probably graded this question about 1500 times).  All papers received a score of E (essentially correct), P (partially correct) or I (incorrect) based on the College Board rubric.  Even though this would qualify as an “easy-ish” problem in AP Stats, it’s still the student’s responsibility to justify and communicate.  For this problem, there are 3 features we AP readers looked for:

  • A correct answer
  • Indictation of a normal distribution used, along with mean and standard deviation identifiction
  • Indictation of a boundary value of 140

The last 2 bullets could be met in a number of ways – by diagram, by symbols. It’s a good lessson to students that even basic stats problems require justification.

Assessing Normality in AP Stats

If you make a Wordle of all of the year-long conversation in an AP Statistics class, the word “normal” will certainly be one of the font-size winners.  Think of all of the places the word “normal” enters the conversation –

  • We find the probability of events given a nomal distribution.
  • We combine random variables, which may have normal distributions.
  • We discuss a normal approximation for a binomial setting.
  • The Central Limit Theorem allows us to assume a sampling distribution of sample means will be approximately normal if the sample size is sufficiently large.
  • The sampling distribution of sample proportions will be approximately normal if the expected number of successes and failures is “large”
  • We assess samples for signs of normality in their parent populations.

It’s this last bullet which if often the trickiest for students, yet the most critical when it comes to structure of hypothesis testing. Exactly what are “signs” of normaility? How can I tell if they have been met? And what is “it” that is approximately normal anyway?  These are questions which come up early in Stats as we begin to look at the distribution of samples.

Here’s a diagram which makes an appearance often in my class, and provides the framework for my lesson on assessing normality:

popsamp

Much of what we do in statistics deals with taking a representative sample from a large population, making a conjecture about the population, then using mathematical evidence to reach a conclusion.  In my class, this is our first experience with making decisions about a population based on sample evidence, and I need the language and ideas to be tight from the start.  To start, I hand out a sheet with 8 different boxplots on it, and ask students to assess them. Specifically:

Based on the sample, do you feel there is evidence that the population from which it came could be approximately normal?

Groups then discuss each of the 8 graphs, and a quick show of hands is used to vote “yes” (pro-population-normailty) or no for each of the graphs.  Up to now, students have had exposure with center, shape and spread ideas, the relationship betwee mean and median in a symmetric distribution, and the 68-95 rule. Conversation often centers on perceived skewness and outliers, and oberservations surrounding the centering of the median in the “box” part of the boxplot.

Now it’s time for the big reveal…..not only do all 8 of the boxplots come from populations which are approximately normal, they all are samples from the SAME population. It’s a mean trick, no doubt, but I now show students the Fathom document used to create the samples, and have the file cycle through 200 different samples. This is often eye-opening to students, as they begin to see the wide variation in samples from the same population, and hopefully causes them to cast a bigger net when looking to “assume” normailty in populations. The video below explains the procedure:

In the second half of this activity, I share 6 data sets with the class, which I have pulled from various sources. The data is linked from my class TI84 or Nspire software and sent to students. The task at hand is to assess each data set, and conjecture if the parent population can be assumed to have an approximately normal distribution. This Excel file contains the data sets, which you can format for your use.

In this activity, the goal is to determine if a given sample comes from a population that is approximately normal.  By now, students have a decent grasp for what to look for:

  1. Mean “close” to the median
  2. Symmetry, perhaps a few outliers
  3. Rough adherence to the 68-95 rule (this is tough to actually check, but if it is checkable, we should give it a good attempt)

For now, I leave number 4 on the list blank. It will be discussed later. In addition to making a decision pro/con normality, I ask groups to conjecture about the source of each data set. The titles of the columns do provide some context clues. to the sources of the sets:

  • PRICE – price of 117 homes sold in Albequerque, NM in 1993
  • TEMP – high temperatures in Las Vegas in July, August 2007
  • MYST – the mystery list. 100 random integers from 50-100 (from RandInt on a TI-84)
  • WT – weights of adult males ages 22-30, from a clinical study
  • AGE – age of CEO’s from a Forbes list of Top Companies
  • BRAIN – IQ scores for 40 research subjects

As groups share their findings on the board, some important themes emerge:

  • Context matters! If we consider the source of a data set, this may provide important information about its population distribution. Often, measurements from things in nature (heights, weights, lengths, IQ’s) have an approximately normal distribution. Data involving salaries and prices, meanwhile, are often skewed.
  • Multiple representations are helpful. Above, the data set “IQ” has a nice, symmetric distribution if you look at its boxplot. But a dotplot reveals an important feature not evident in the boxplot – the data consists of 2 distinct groupings, with a large gap in the center.
  • It’s not the sample which we are trying to prove normal, it’s the underlying population. Later, during hypothesis testing, it is common find students who caim “the sample is normal” based on a boxplot (or those who simply claim, “it’s normal”). We need to help students move away from meaningless statements like this, and towards a communicated linkage between the sample and its parent population.
  • As the lesson progresses, the class begins to see that assessing normality is tricky business. We’ll be making a lot of assumptions about the behavior of populations in stats class through the year.  Later, the robustness of procedures will provide a safety net if a population isn’t quite normal.
  • And maybe the most important idea: it’s not so important that we clearly identify and justify populations which are normal; it’s more important that we identify populations which are clearly NOT normal.

WHAT ABOUT NORMAL PROBABILITY PLOTS?

After all 6 data sets have been evaluated and discussed, I explain the idea and structure of a normal probability plot, which becomes #4 in our list of “what to look for”.  The Npsire does a nice job making them, with the z-score axis clearly labeled.

normal

I have found that the more years I teach AP Stats, the less I stress this graph. It’s easily forgotten under the avalanche of information in the course, and the procedures described above are sufficient for the job. Unless you spend time developing the structure of this graph – why transforming percentiles to z-scores in a normal distribution yields a linear function – it becomes another disconnected idea to memorize. I show it – but then we cast it aside.

AP Statistics “Best Practices” 2014

Last week, I arrived home after 8 days in Kansas City, where I participated in the AP Statstics Exam reading. It’s hard work, filled with long days of grading papers. But all the readers seem to take some sadistic delight in this work, and the professional connections made through the week are outstanding.

One of the highlights of the week is Best Practices Night, organized by my friend Adam Shrager. This year, 20 or so different folks presented 5-minute looks into their classrooms.  Below are summaries of some of my personal favorites. You can check out all of the presentations on Jason Molesky’s StatsMonkey site

GUMMI BEARS – KEVIN DiVIZIA

You’ll find that AP Stats teachers enjoy candy….too much so at times my doctor tells me. Last year, Kevin shared his data collection activity with stomp rockets.  This year, Kevin upped the ante, with an activity where students launch Gummy Bears, Gummy Worms and other candies using catapults.  Which type of candy flies farthest? What can we say about the consistancy of the launches? I’m looking to incorporate this into my 9th grade class as an introduction to variability and estimation.

Gummis

Kevin’s presentation on the StatsMonkey site is Keynote. I have converted it here to Powerpoint for us non-Keynote users.

MORBID MATH – BRIANNA KURTZ

Stats teachers have many data collection activities in their arsenal, but this idea from Brianna wins the prize for most off-beat concept. In this activity, students are asked to estimate life expectancy in a population. To collect data, the class uses something readily avilable every day: the obituaries. This presentation was one of the clear highlights of the evening, with many in attendance wondering what a class taught by the hysterically entertaining Brianna would be like!  Visit StatsMonkey for her activity worksheet, and use the dead as data!

zpuzzles Z-PUZZLES – CHRISTINE WOZNIAK

Jigsaw puzzles make for great reviews in just about any math class.  Here, Christine shares puzzles she uses to review the Normal Distribution. Cut out the pieces, find the probabilities and solve the puzzle!  Template included.

SAMPLING USING BEADS – PAUL RODRIGUEZ

Paul is part of the AP Stats Test Development Committee, and always has great ideas for the Stats Classroom. At the reading, Paul shared his sampling activity, using Air Gun ammo of different colors (and slightly different sizes) to draw small samples from a large population. Using a paddle made from pegboard, random samples can be drawn, leading to a first discussion on inference. Paul promises to share the plans for building your own sampling paddle, so check back on StatsMonkey often!

UPDATE: Paul’s presentation has been uploaded to the StatsMonkey Site, along with plans for making your own sampling paddles.

STARBUSTS AND R-SQUARED – DOUG TYSON

I appreciate presentations where speakers attempt to de-tangle a tricky concept in math class. Having students move beyond a “canned” understanding of the coefficient of determination and towards a real understanding of predictive improvement based on an explanatory variable is a worthwhile lesson. In his activity, Doug Tyson challenges students to grab as many Starburst candies (see…I told you Stats folks like cnady) as possible in their hand, then examines the predictive value of using hand size to estimate the number of grabbed candies.  How much are our predictions improved by thinking about hand size, as opposed to thinking about the mean?

There’s so much more sharing goodness on the StatsMonkey site, including:

  • A review of Geddit, for formative assessment
  • A QR code scavenger hunt
  • Hershey Kisses and Confident Intervals, which I used in my class this year

Soon, I will post more resources shared by Chris Franklin, who gave a brief history of stats education during her Professional Night presentation.