A map of Argentina….a class of impressionable 9th graders…and a devious teacher….what could go wrong? I’m currently reviewing past statistical ideas with my 9th graders, with an eye towards standard deviation, the normal curve and sampling. To generate some data, I asked each of my classes 2 questions about the country of Argentina:

MORNING CLASS:

Do you believe the population of Argentina is MORE or LESS than 10 million?

Estimate the population of Argentina.

AFTERNOON CLASS:

Do you believe the population of Argentina is MORE or LESS than 50 million?

Estimate the population of Argentina.

Both classes gave me strange looks. But with instructions to answer as best they could, the students played along and provided data. Did you note the subtle differences between the two question sets? The two classes provide striking different estimates, due to the anchoring from the first question.

The inspiration for this activity comes from the book A Mathematician Reads the Newspaper by John Allen Paulos, which contains many other quick nuggets to use in your classroom. And now we have a rich conversation regarding the wording of poll questions to enjoy in the next few days!

Today’s opener was chosen for totally selfish reasons, which I will explain below. We’re starting our statistics unit today, so we’ll be a doing a lot of review of means, meadians, quatiles, and graphical displays – all with an eye towards interpretation. But only after I have students respond to a prompt for me:

What is a least common denominator (LCD)? Provide directions for finding an LCD to someone who may not know how to find one.

This Friday, I will be out of school for the Association of Math Teachers of New Jersey conference, where I am looking forward to participating in an Ignite session, hosted by my friends from the Drexel Math Forum. In these talks, a speaker has 5 minutes and 20 slides to share their idea – mine is on the importance of language skills in math classrooms.

So, today’s opener was entirely selfish, as I was looking for examples to share during the Ignite. The LCD problem is one I have given before during rational expressions units. – try it with your classes and watch the misconceptions fly! Responses to this prompt can often be pigeonholed:

THE “EXPLANATION BY EXAMPLE” CROWD

THE “NOT QUITE COMPLETE” CROWD

THE “MINIMALISTS”

To be fair, I gave this prompt out of context, as fractions aren’t on our radar now. But it’s fascinating to see what built-in ideas students come to the high school with regarding a task they have now done for many years.

It’s a big day for my freshmen, as today is their second unit test – this one on sequences, series and the binomial theorem. For the last few days, we have been knee-deep in the world of arithmetic and geometric sequences, so much so that you’d think they are the only types of numerical patterns. But in today’s opener I want to expose them to something deeper, without freaking them out before the test -

My fear is that if I dive too deep into these problems now, I’ll worry students who are studying for the test. So instead I provide about 60 seconds to discuss what we see, and will let the problem marinate. If students finish the test early, they’ll have something to think about and tackle. Hopefully we will have a chance to come back to these problems next week, but sometimes I feel bad when I start a new problem and don’t find the time to re-visit it. For now, I think I’ll leave the problems on the board, and see if anyone volunteers information.

If you are playing along at home, here is information about the two problems above. The first is the harmonic series, which diverges (approaches infinity), while the second (the sum of the reciprocals of the triangular numbers) surprisingly approaches 2.

Upon reflection, there were some natural places to flip instruction in this chapter. In the video below, students took notes on the sum of an arithmetic series. As a number of students in my last period class leave for sports, this was an effective way to keep everyone on the same page. Enjoy!

Usually my openers here on the blog are those I share with my freshman classes, but today’s post features my AP Statistics class. They are preparing for their test on normal distributions, and it’s no time to be spooked! Today’s class started with the famous stats cartoon shown here, and an entrance ticket – one part of a past AP problem dealing with normal distributions.

Schools in a certain state receive funding based on the number of students who attend the school. To determine the number of students who attend a school, one school day is selected at random and the number of students in attendance that day is counted and used for funding purposes. The daily number of absences at High School A in the state is approximately normally distributed with mean of 120 students and standard deviation of 10.5 students. (a) If more than 140 students are absent on the day the attendance count is taken for funding purposes, the school will lose some of its state funding in the subsequent year. Approximately what is the probability that High School A will lose some state funding?

The full exam (and all free-response questions) are available on the AP Statistics area on the College Board website, who own the copyright on all AP problems.

Despite the length of text in this problem, part a here is a simple normal distribution probability, one which any AP student should be able to tackle easily.

I gave students 4 minutes to provide a solution on the printed sheet, but did not ask them to identify themselves on the paper. After collecting the sheets, I mixed them up and prepared to share them under my document camera. This particular problem is one I graded last summer at the AP Stats reading in Kansas City, and if you know what you are looking for, it is a quick grade (by my super-unofficial count, I probably graded this question about 1500 times). All papers received a score of E (essentially correct), P (partially correct) or I (incorrect) based on the College Board rubric. Even though this would qualify as an “easy-ish” problem in AP Stats, it’s still the student’s responsibility to justify and communicate. For this problem, there are 3 features we AP readers looked for:

A correct answer

Indictation of a normal distribution used, along with mean and standard deviation identifiction

Indictation of a boundary value of 140

The last 2 bullets could be met in a number of ways – by diagram, by symbols. It’s a good lessson to students that even basic stats problems require justification.

If you make a Wordle of all of the year-long conversation in an AP Statistics class, the word “normal” will certainly be one of the font-size winners. Think of all of the places the word “normal” enters the conversation -

We find the probability of events given a nomal distribution.

We combine random variables, which may have normal distributions.

We discuss a normal approximation for a binomial setting.

The Central Limit Theorem allows us to assume a sampling distribution of sample means will be approximately normal if the sample size is sufficiently large.

The sampling distribution of sample proportions will be approximately normal if the expected number of successes and failures is “large”

We assess samples for signs of normality in their parent populations.

It’s this last bullet which if often the trickiest for students, yet the most critical when it comes to structure of hypothesis testing. Exactly what are “signs” of normaility? How can I tell if they have been met? And what is “it” that is approximately normal anyway? These are questions which come up early in Stats as we begin to look at the distribution of samples.

Here’s a diagram which makes an appearance often in my class, and provides the framework for my lesson on assessing normality:

Much of what we do in statistics deals with taking a representative sample from a large population, making a conjecture about the population, then using mathematical evidence to reach a conclusion. In my class, this is our first experience with making decisions about a population based on sample evidence, and I need the language and ideas to be tight from the start. To start, I hand out a sheet with 8 different boxplots on it, and ask students to assess them. Specifically:

Based on the sample, do you feel there is evidence that the population from which it came could be approximately normal?

Groups then discuss each of the 8 graphs, and a quick show of hands is used to vote “yes” (pro-population-normailty) or no for each of the graphs. Up to now, students have had exposure with center, shape and spread ideas, the relationship betwee mean and median in a symmetric distribution, and the 68-95 rule. Conversation often centers on perceived skewness and outliers, and oberservations surrounding the centering of the median in the “box” part of the boxplot.

Now it’s time for the big reveal…..not only do all 8 of the boxplots come from populations which are approximately normal, they all are samples from the SAME population. It’s a mean trick, no doubt, but I now show students the Fathom document used to create the samples, and have the file cycle through 200 different samples. This is often eye-opening to students, as they begin to see the wide variation in samples from the same population, and hopefully causes them to cast a bigger net when looking to “assume” normailty in populations. The video below explains the procedure:

In the second half of this activity, I share 6 data sets with the class, which I have pulled from various sources. The data is linked from my class TI84 or Nspire software and sent to students. The task at hand is to assess each data set, and conjecture if the parent population can be assumed to have an approximately normal distribution. This Excel file contains the data sets, which you can format for your use.

In this activity, the goal is to determine if a given sample comes from a population that is approximately normal. By now, students have a decent grasp for what to look for:

Mean “close” to the median

Symmetry, perhaps a few outliers

Rough adherence to the 68-95 rule (this is tough to actually check, but if it is checkable, we should give it a good attempt)

For now, I leave number 4 on the list blank. It will be discussed later. In addition to making a decision pro/con normality, I ask groups to conjecture about the source of each data set. The titles of the columns do provide some context clues. to the sources of the sets:

PRICE – price of 117 homes sold in Albequerque, NM in 1993

TEMP – high temperatures in Las Vegas in July, August 2007

MYST – the mystery list. 100 random integers from 50-100 (from RandInt on a TI-84)

WT – weights of adult males ages 22-30, from a clinical study

AGE – age of CEO’s from a Forbes list of Top Companies

BRAIN – IQ scores for 40 research subjects

As groups share their findings on the board, some important themes emerge:

Context matters! If we consider the source of a data set, this may provide important information about its population distribution. Often, measurements from things in nature (heights, weights, lengths, IQ’s) have an approximately normal distribution. Data involving salaries and prices, meanwhile, are often skewed.

Multiple representations are helpful. Above, the data set “IQ” has a nice, symmetric distribution if you look at its boxplot. But a dotplot reveals an important feature not evident in the boxplot – the data consists of 2 distinct groupings, with a large gap in the center.

It’s not the sample which we are trying to prove normal, it’s the underlying population. Later, during hypothesis testing, it is common find students who caim “the sample is normal” based on a boxplot (or those who simply claim, “it’s normal”). We need to help students move away from meaningless statements like this, and towards a communicated linkage between the sample and its parent population.

As the lesson progresses, the class begins to see that assessing normality is tricky business. We’ll be making a lot of assumptions about the behavior of populations in stats class through the year. Later, the robustness of procedures will provide a safety net if a population isn’t quite normal.

And maybe the most important idea: it’s not so important that we clearly identify and justify populations which are normal; it’s more important that we identify populations which are clearly NOT normal.

WHAT ABOUT NORMAL PROBABILITY PLOTS?

After all 6 data sets have been evaluated and discussed, I explain the idea and structure of a normal probability plot, which becomes #4 in our list of “what to look for”. The Npsire does a nice job making them, with the z-score axis clearly labeled.

I have found that the more years I teach AP Stats, the less I stress this graph. It’s easily forgotten under the avalanche of information in the course, and the procedures described above are sufficient for the job. Unless you spend time developing the structure of this graph – why transforming percentiles to z-scores in a normal distribution yields a linear function – it becomes another disconnected idea to memorize. I show it – but then we cast it aside.

Aren’t infinite geometric series cool? If you just shouted “yes”, then you are potentially as geeky as I am. A “proof without words” from MathFail kicked off today’s discussion:

I wasn’t quite sure what sort of observations I would receive from my class. But just enough ideas were generated to get us going:

There are an infinite number of triangles down the right side.

All those triangles on the right add up to the half-triangle on the left.

Both are great starts for what I hope my students will learn today. A video I made in my driveway continued the ideas of geometric series and their infinite terms.

A few students wanted to argue that the sequence in the video was arithmetic, but some meaningful debate yielded agreement that geometric made more sense. Groups then worked through a similar problem involving a Superball being dropped, leading to terms and total distance traveled.

Many groups employed a “brute force” method to find their answers. Using the Desmos calculator (many students chose to use the iPhone app), we found value in developing the equation and using tables and summation symbols to find solutions. This was my first time usign Desmos with this particular lesson, and it was an awesome addition, which added value to the need for writing a clear function to define your situation.

Sometimes I am genuinely surprised when a fun game or idea I was exposed to in elementary school is unknown by my older students. The Towers of Hanoi problem fits the description. A number of students immediately gravitated towards the game as they entered. And after a quick explanation of the rules, they got down to business:

An online, interaction version of the game was also running on my projector, which generated some class discussion of strategy. Students were able to solve many of the early challenges, and we began to look for the most efficient methods, sharing our findings on the board.

1 disc, 1 move

2 discs, 3 moves

3 discs, 7 moves

4 discs, 15 moves

From here, some students conjectured that 5 disks must take 15 moves….but nothing is quite that easy, and I asked them to prove or show it. During classwork time, I heard discussion of possible formulas. Since we have been working with both explicit and recursive formulas, this was an effective way to discuss the differences between them, and why we might prefer one over the other.

The explicit formula sure is nice, but the recursive provides a roadmap for solving the problem for any number of disks.