Today’s opener was inspired by a movie correlations activity I have used in AP Statistics, and Cathy Yenca’s awesome activity which brings this idea down to the Algebra level.

For my freshman class, I wanted to students to “discover” the role of the correlation coefficient r – how it acts as a measure of the strength of the relationship between two quantitative variables. To begin, 10 potential vacation / off-day activities were listed on the board:

Ski

Go to Beach

Amusement Park

Baseball Game

Broadway Show

Camping

Washington DC Tour

Shopping Day

Big Concert

Cruise

Students were each asked to rank these activities from 1 to 10 (10 being most desirable) and using each number only once. The class then moved into partnerships with my suggestion that they work with someone they maybe did not know so well in class, and compared results. With an odd number of students, I worked with a student to share interests. Results for each activity were plotted as ordered pairs, with each partner contributing their number score. Students plotted their points on graph paper, while my student partner and I used Desmos – and quickly discovered that we have little in common.

From there, students learned how to use graphing calculators to analyze the data – making the scatterplot and finding the best-fit line. The partnerships also wrote this mysterious new statistic – r – on the bottom of the graph and shared their graph in the board. Through a gallery walk, the class examined the graphs and tried to conjecture the meaning of r.

This worked better than planned, as the class quickly made some key observations:

Pairs with stronger relationships have “higher” r values.

There are no r-values greater than 1.

r can be negative if people answer opposite each other.

Definitely will add this activity to my arsenal every year!

If you are interested in the activity for AP Stats, you can check out the Google Form we use, then some instructions for processing the data in this video:

We’re into the home stretch of our stats unit and I am looking to reflect upon our study of the normal distribution, yet look ahead at what’s to come – sampling and margin of error. The Rossman-Chance Applets provide some meaningful, interactive discussion starters in statistics. Today, as students entered, the Reeses Pieces sampling applet was drawing random samples of size 25 – displaying the results in a dotplot. Little by little, the dotplot took shape as more samples were drawn…until, eventually, an old friend made an appearance….

Hey, that’s the Normal Distribution! Why yes, yes it is…..isn’t it great that random samping reveals such a powerful statistical concept?

But that’s not all. With election day approaching next week, we can start to build connections between random sampling, the normal distribution, and political polling. The parallels are strong, and we’ll talk about them within the next week.

With sampling candy, there is a pre-existing proportion of “orange” candy. In a voting population, there is an existing proportion of people who will vote for a certain candidate.

With candy, we can draw a random sample of candies. For a poll, we contact a random sample of potential voters.

When we sample candy, sometimes we might get 50% orange, or 45 % orange, or 60% orange – variability is part of the game. In political polling, we try to estimate the poroportion who will vote for a candidate, and we hope to get close to the target. Margin of error gives us an idea of how close we are.

There are many other interesting discussions to be had surrounding the applets – try some with your classes. They aren’t just for AP kids!

We’re about half-way through a basic stats unit in my 9th grade class, with a quiz tomorrow on standard deviation and the normal distribution. I need one last class example to have students compare and contrast data sets by looking at their centers and variability. A morning brainstorm turned into a fun exploration of my students’ personalities. 3 groupings were shown my back whiteboard:

EXTROVERTS

MIDDLE

INTROVERTS

After a brief discussion of what it means to be introverted or extroverted, and doing my best to steer discussion away from any negative connotations, I asked students to self-identify and move to a corner of the room based on where they see themselves. To clean up things some, I told them to arrange themselves so that we had exactly 8 introverts and 8 extroverts, with everyone else in the middle. Some adjusting then took place, as we agreed on who belonged in which group.

Now for the data collection aspect. I had each student approach the back board and write their signature in the appropriate column. This is where the fun began – as my introverts calmly waited for their peers to write their names and move away, the extroverts fought over markers and board space. As students sat down after contrubiting their signature, some noticed immediately what was happening :

After all names were written, and we had a good laugh over the clear differences in the categories, we needed some data. Each student approached the board and measured the height of a name at its tallest point, recording to the nearest tenth of a centimeter. Tonight’s homework is then to compute the standard devation “by hand” for one of the groups, and comment on differences. My old friend the Nspire App is helpful here to show the clear difference between the introverts and the extroverts:

Using authentic data in class matters, as kids more readily discuss what they see and are generally more eager to dig deeper into a problem. This was a fun way to culminate the first half of our stats unit.

We’re thinking about standard deviation in my 9th grade class, and the idea of variation and “unusual” data points. I think the picture which greeted students today says just about all which needs to be said on standard deviation, doesn’t it?

Later in class, I asked students to plot their heights on a number line I had drawn, with a low of 60 inches and a high of 74. From here, I asked students to estimate what our class standard deviation might be. Some interesting responses were generated:

10 – probably because 60 and 70 appeard on the line.

5 – because that would seem to cover the number line

When I reminded them that standard deviation can be thought of as “typical distance from the mean”, the responses evolved and eventually we settled on between 2 and 3, where travelling 2 standard deviations in each direction would cover everyone in the class. Next, when I told them that the World’s Tallest Man had a height over 8 standard deviations from the mean, meaningful gasps were shared, and we could move on to notes onvolving the normal distribution.

Short post today as I am about to start 23 parent conferences over 6 hours….wish me luck!

A map of Argentina….a class of impressionable 9th graders…and a devious teacher….what could go wrong? I’m currently reviewing past statistical ideas with my 9th graders, with an eye towards standard deviation, the normal curve and sampling. To generate some data, I asked each of my classes 2 questions about the country of Argentina:

MORNING CLASS:

Do you believe the population of Argentina is MORE or LESS than 10 million?

Estimate the population of Argentina.

AFTERNOON CLASS:

Do you believe the population of Argentina is MORE or LESS than 50 million?

Estimate the population of Argentina.

Both classes gave me strange looks. But with instructions to answer as best they could, the students played along and provided data. Did you note the subtle differences between the two question sets? The two classes provide striking different estimates, due to the anchoring from the first question.

The inspiration for this activity comes from the book A Mathematician Reads the Newspaper by John Allen Paulos, which contains many other quick nuggets to use in your classroom. And now we have a rich conversation regarding the wording of poll questions to enjoy in the next few days!

Usually my openers here on the blog are those I share with my freshman classes, but today’s post features my AP Statistics class. They are preparing for their test on normal distributions, and it’s no time to be spooked! Today’s class started with the famous stats cartoon shown here, and an entrance ticket – one part of a past AP problem dealing with normal distributions.

Schools in a certain state receive funding based on the number of students who attend the school. To determine the number of students who attend a school, one school day is selected at random and the number of students in attendance that day is counted and used for funding purposes. The daily number of absences at High School A in the state is approximately normally distributed with mean of 120 students and standard deviation of 10.5 students. (a) If more than 140 students are absent on the day the attendance count is taken for funding purposes, the school will lose some of its state funding in the subsequent year. Approximately what is the probability that High School A will lose some state funding?

The full exam (and all free-response questions) are available on the AP Statistics area on the College Board website, who own the copyright on all AP problems.

Despite the length of text in this problem, part a here is a simple normal distribution probability, one which any AP student should be able to tackle easily.

I gave students 4 minutes to provide a solution on the printed sheet, but did not ask them to identify themselves on the paper. After collecting the sheets, I mixed them up and prepared to share them under my document camera. This particular problem is one I graded last summer at the AP Stats reading in Kansas City, and if you know what you are looking for, it is a quick grade (by my super-unofficial count, I probably graded this question about 1500 times). All papers received a score of E (essentially correct), P (partially correct) or I (incorrect) based on the College Board rubric. Even though this would qualify as an “easy-ish” problem in AP Stats, it’s still the student’s responsibility to justify and communicate. For this problem, there are 3 features we AP readers looked for:

A correct answer

Indictation of a normal distribution used, along with mean and standard deviation identifiction

Indictation of a boundary value of 140

The last 2 bullets could be met in a number of ways – by diagram, by symbols. It’s a good lessson to students that even basic stats problems require justification.

If you make a Wordle of all of the year-long conversation in an AP Statistics class, the word “normal” will certainly be one of the font-size winners. Think of all of the places the word “normal” enters the conversation –

We find the probability of events given a nomal distribution.

We combine random variables, which may have normal distributions.

We discuss a normal approximation for a binomial setting.

The Central Limit Theorem allows us to assume a sampling distribution of sample means will be approximately normal if the sample size is sufficiently large.

The sampling distribution of sample proportions will be approximately normal if the expected number of successes and failures is “large”

We assess samples for signs of normality in their parent populations.

It’s this last bullet which if often the trickiest for students, yet the most critical when it comes to structure of hypothesis testing. Exactly what are “signs” of normaility? How can I tell if they have been met? And what is “it” that is approximately normal anyway? These are questions which come up early in Stats as we begin to look at the distribution of samples.

Here’s a diagram which makes an appearance often in my class, and provides the framework for my lesson on assessing normality:

Much of what we do in statistics deals with taking a representative sample from a large population, making a conjecture about the population, then using mathematical evidence to reach a conclusion. In my class, this is our first experience with making decisions about a population based on sample evidence, and I need the language and ideas to be tight from the start. To start, I hand out a sheet with 8 different boxplots on it, and ask students to assess them. Specifically:

Based on the sample, do you feel there is evidence that the population from which it came could be approximately normal?

Groups then discuss each of the 8 graphs, and a quick show of hands is used to vote “yes” (pro-population-normailty) or no for each of the graphs. Up to now, students have had exposure with center, shape and spread ideas, the relationship betwee mean and median in a symmetric distribution, and the 68-95 rule. Conversation often centers on perceived skewness and outliers, and oberservations surrounding the centering of the median in the “box” part of the boxplot.

Now it’s time for the big reveal…..not only do all 8 of the boxplots come from populations which are approximately normal, they all are samples from the SAME population. It’s a mean trick, no doubt, but I now show students the Fathom document used to create the samples, and have the file cycle through 200 different samples. This is often eye-opening to students, as they begin to see the wide variation in samples from the same population, and hopefully causes them to cast a bigger net when looking to “assume” normailty in populations. The video below explains the procedure:

In the second half of this activity, I share 6 data sets with the class, which I have pulled from various sources. The data is linked from my class TI84 or Nspire software and sent to students. The task at hand is to assess each data set, and conjecture if the parent population can be assumed to have an approximately normal distribution. This Excel file contains the data sets, which you can format for your use.

In this activity, the goal is to determine if a given sample comes from a population that is approximately normal. By now, students have a decent grasp for what to look for:

Mean “close” to the median

Symmetry, perhaps a few outliers

Rough adherence to the 68-95 rule (this is tough to actually check, but if it is checkable, we should give it a good attempt)

For now, I leave number 4 on the list blank. It will be discussed later. In addition to making a decision pro/con normality, I ask groups to conjecture about the source of each data set. The titles of the columns do provide some context clues. to the sources of the sets:

PRICE – price of 117 homes sold in Albequerque, NM in 1993

TEMP – high temperatures in Las Vegas in July, August 2007

MYST – the mystery list. 100 random integers from 50-100 (from RandInt on a TI-84)

WT – weights of adult males ages 22-30, from a clinical study

AGE – age of CEO’s from a Forbes list of Top Companies

BRAIN – IQ scores for 40 research subjects

As groups share their findings on the board, some important themes emerge:

Context matters! If we consider the source of a data set, this may provide important information about its population distribution. Often, measurements from things in nature (heights, weights, lengths, IQ’s) have an approximately normal distribution. Data involving salaries and prices, meanwhile, are often skewed.

Multiple representations are helpful. Above, the data set “IQ” has a nice, symmetric distribution if you look at its boxplot. But a dotplot reveals an important feature not evident in the boxplot – the data consists of 2 distinct groupings, with a large gap in the center.

It’s not the sample which we are trying to prove normal, it’s the underlying population. Later, during hypothesis testing, it is common find students who caim “the sample is normal” based on a boxplot (or those who simply claim, “it’s normal”). We need to help students move away from meaningless statements like this, and towards a communicated linkage between the sample and its parent population.

As the lesson progresses, the class begins to see that assessing normality is tricky business. We’ll be making a lot of assumptions about the behavior of populations in stats class through the year. Later, the robustness of procedures will provide a safety net if a population isn’t quite normal.

And maybe the most important idea: it’s not so important that we clearly identify and justify populations which are normal; it’s more important that we identify populations which are clearly NOT normal.

WHAT ABOUT NORMAL PROBABILITY PLOTS?

After all 6 data sets have been evaluated and discussed, I explain the idea and structure of a normal probability plot, which becomes #4 in our list of “what to look for”. The Npsire does a nice job making them, with the z-score axis clearly labeled.

I have found that the more years I teach AP Stats, the less I stress this graph. It’s easily forgotten under the avalanche of information in the course, and the procedures described above are sufficient for the job. Unless you spend time developing the structure of this graph – why transforming percentiles to z-scores in a normal distribution yields a linear function – it becomes another disconnected idea to memorize. I show it – but then we cast it aside.