Engaging students: Approximating data by a straight line

In my capstone class for future secondary math teachers, I ask my students to come up with ideas for engaging their students with different topics in the secondary mathematics curriculum. In other words, the point of the assignment was not to devise a full-blown lesson plan on this topic. Instead, I asked my students to think about three different ways of getting their students interested in the topic in the first place.

I plan to share some of the best of these ideas on this blog (after asking my students’ permission, of course).

This student submission again comes from my former student Caroline Wick. Her topic, from Algebra: approximating data to a straight line.

green lineB1. Curriculum

How can this topic be used in your students’ future courses in mathematics or science?

Though approximating data by a straight line is a subject that is brought up in Algebra 2, it is something that students will need to use in a number of subjects down the line. Probably the most obvious subject would be statistics. Finding an approximate trend line is extremely important for a statistician so that they can predict future, unobserved data. Another example that might not be as readily noticeable would be anthropology. Anthropology is the study of humans in various parts of life. In this case, according to Brian Hopkins, anthropology can be used by stores to figure out what types of products they should stock on their shelves during different types of the year. They do this by collecting the data, then approximating the trend lines to predict how the product will sell during the same season of the next year. For example, Orange Juice and tissues are known to be sold more often during the winter seasons, so stores know that they want to stock up on orange juice and tissue during the colder season each year.

 

green line

 

A1: Applications

What interesting (i.e., uncontrived) word problems using this topic can your students do now?
Using the data given below:
(a) plot the points on a graph
(b) Then, using a ruler, do your best to approximate a trend line that fits the points
(c) Write an equation (y=mx+b) that best fits the trend line
(d) Approximate the next four numbers on the line using the equation you created.

Population growth in squirrels in TX from 1950-1980 (in millions)*
Year (x) 1950 1955 1960 1965 1970 1975 1980
Pop. (y) 12 12.7 13.1 13 13.6 13.7 14

From here the student would create his/her graph with the plotted points, find a line that best fits the points with equal numbers over and under the line. They would then use the data and the line to find an equation that best fits the scatter plot data that they graphed. They would then find the approximate squirrel population for 1985, 1990, 1995, and 2000.

This could be either an assignment or it could turn into a project for students with different sets of data. Students could even collect their own data to formulate the graph and equation.

*not real data, fabricated for this problem specifically.

green line

Culture
How has this topic appeared in pop culture (movies, TV, current music, video games, etc.)?

The approximation of data through trend lines has been used in pop culture since the birth of popular culture in the mid twentieth century. More relevantly, it is used to map certain cultural trends. When a new movie is coming out, statisticians use previous data from people who watched/reviewed the movie before its release to map out how they believe it will be appreciated by the public. A movie that did will before its release will likely have a positive trend line that continues upward at a somewhat steady rate. It will get more tickets at the box office than a movie that was not as well liked that might have a less-steep slope. Statisticians use this same trend approximation with TV shows and whether they should run another season, or in music when it hits the top of the charts. The more people listen to a song, the more likelihood it has to be listened to other people, thus the trend continues upward until is slowly dies off.

Take for instance, Taylor Swift’s “Look What You Made Me Do” that was released August 25th of this year. From its release and popularity, statisticians were able to track the data and predict that the song would be number 1 on the top 100 just a few weeks after its release.

References:

B1: https://www.cio.com/article/2372429/enterprise-architecture/the-anthropology-of-data.html
C1: http://www.billboard.com/articles/news/7949029/taylor-swift-look-what-you-made-me-do-timeline-reputation

 

Paranormal distribution

Source: https://www.tumblr.com/search/paranormal%20distribution

Finding the Regression Line without Calculus

Last month, my latest professional article, Deriving the Regression Line with Algebra, was published in the April 2017 issue of Mathematics Teacher (Vol. 110, Issue 8, pages 594-598). Although linear regression is commonly taught in high school algebra, the usual derivation of the regression line requires multidimensional calculus. Accordingly, algebra students are typically taught the keystrokes for finding the line of best fit on a graphing calculator with little conceptual understanding of how the line can be found.

In my article, I present an alternative way that talented Algebra II students (or, in principle, Algebra I students) can derive the line of best fit for themselves using only techniques that they already know (in particular, without calculus).

For copyright reasons, I’m not allowed to provide the full text of my article here, though subscribers to Mathematics Teacher should be able to read the article by clicking the above link. (I imagine that my article can also be obtained via inter-library loan from a local library.) That said, I am allowed to share a macro-enabled Microsoft Excel spreadsheet that I wrote that allows students to experimentally discover the line of best fit:

http://www.math.unt.edu/~johnq/ExploringTheLineofBestFit.xlsm

I created this spreadsheet so that students can explore (which is, after all, the first E of the 5-E model) the properties of the line of best fit. In this spreadsheet, students can enter a data set with up to 10 points and then experiment with different slopes and y-intercepts. As they experiment, the spreadsheet keeps track of the current sum of the squares of the residuals as well as the best guess attempted so far. After some experimentation, the spreadsheet can also provide the correct answer so that students can see how close they got to the right answer.

My Favorite One-Liners: Part 95

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

Today’s quip is one that I’ll use in a statistics class when we find an extraordinarily small P-value. For example:

There is a social theory that states that people tend to postpone their deaths until after some meaningful event… birthdays, anniversaries, the World Series.

In 1978, social scientists investigated obituaries that appeared in a Salt Lake City newspaper. Among the 747 obituaries examined, 60 of the deaths occurred in the three-month period preceding their birth month. However, if the day of death is independent of birthday, we would expect that 25% of these deaths would occur in this three-month period.

Does this study provide statistically significant evidence to support this theory? Use \alpha=0.01.

It turns out, using a one-tailed hypothesis test for proportions, that the test statistics is z = -10.71 and the P-value is about 4.5 \times 10^{-27}. After the computations, I’ll then discuss what the numbers mean.

I’ll begin by asking, “Is the null hypothesis [that the proportion of deaths really is 25%] possible?” The correct answer is, “Yes, it’s possible.” Even extraordinarily small P-values do not prove that the null hypothesis is impossible. To emphasize the point, I’ll say:

After all, I found a woman who agreed to marry me. So extremely unlikely events are still possible.

Once the laughter dies down, I’ll ask the second question, “Is the null hypothesis plausible?” Of course, the answer is no, and so we reject the null hypothesis in favor of the alternative.

 

My Favorite One-Liners: Part 71

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

Some of the algorithms that I teach are pretty lengthy. For example, consider the calculation of a 100(1-\alpha)\% confidence interval for a proportion:

\displaystyle \frac{\hat{p} + \displaystyle \frac{z_{\alpha/2}^2}{2n}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } - z_{\alpha/2} \frac{\sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z_{\alpha/2}^2}{4n^2}}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } < p < \displaystyle \frac{\hat{p} + \displaystyle \frac{z_{\alpha/2}^2}{2n}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } + z_{\alpha/2} \frac{\sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z_{\alpha/2}^2}{4n^2}}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} }.

Wow.

Proficiency with this formula definitely requires practice, and so I’ll typically give a couple of practice problems so that my students can practice using this formula while in class. After the last example, when I think that my students have the hang of this very long calculation, I’ll give my one-liner to hopefully boost their confidence (no pun intended):

By now, you probably think that this calculation is dull, uninteresting, repetitive, and boring. If so, then I’ve done my job right.

My Favorite One-Liners: Part 65

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

I’ll use today’s one-liner just before I begin some enormous, complicated, and tedious calculation that’s going to take more than a few minutes to complete. To give a specific example of such a calculation: consider the derivation of the Agresti confidence interval for proportions. According to the central limit theorem, if n is large enough, then

Z = \displaystyle \frac{ \hat{p} - p}{ \displaystyle \sqrt{ \frac{p(1-p) }{n} } }

is approximately normally distributed, where p is the true population proportion and \hat{p} is the sample proportion from a sample of size n. By unwrapping this equation and solving for p, we obtain the formula for the confidence interval for a proportion:

z \displaystyle \sqrt{\frac{p(1-p)}{n} } = \hat{p} - p

\displaystyle \frac{z^2 p(1-p)}{n} = \left( \hat{p} - p \right)^2

z^2p - z^2 p^2 = n \hat{p}^2 - 2 n \hat{p} p + n p^2

0 = p^2 (z^2 + n) - p (2n \hat{p} + z^2) + n \hat{p}^2

We now use the quadratic formula to solve for p:

p = \displaystyle \frac{2n \hat{p} + z^2 \pm \sqrt{ \left(2n\hat{p} + z^2 \right)^2 - 4n\hat{p}^2 (z^2+n)}}{2(z^2+n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm \sqrt{4n^2 \hat{p}^2 + 4n \hat{p} z^2 + z^4 - 4n\hat{p}^2 z^2 - 4n^2 \hat{p}^2}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm \sqrt{4n (\hat{p}-\hat{p}^2) z^2 + z^4}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm \sqrt{4n \hat{p}(1-\hat{p}) z^2 + z^4}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm \sqrt{4n \hat{p} \hat{q} z^2 + z^4}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm z \sqrt{4n \hat{p} \hat{q} + z^2}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm z \sqrt{4n^2 \displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle 4n^2 \frac{z^2}{4n^2}}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm 2nz \sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z^2}{4n^2}}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + 2n \displaystyle \frac{z^2}{2n} \pm 2nz \sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} +\displaystyle \frac{z^2}{4n^2}}}{2n \displaystyle \left(1 + \frac{z^2}{n} \right)}

p = \displaystyle \frac{\hat{p} + \displaystyle \frac{z^2}{2n} \pm z \sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z^2}{4n^2}}}{\displaystyle 1 + \frac{z^2}{n} }

From this we finally obtain the 100(1-\alpha)\% confidence interval

\displaystyle \frac{\hat{p} + \displaystyle \frac{z_{\alpha/2}^2}{2n}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } - z_{\alpha/2} \frac{\sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z_{\alpha/2}^2}{4n^2}}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } < p < \displaystyle \frac{\hat{p} + \displaystyle \frac{z_{\alpha/2}^2}{2n}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } + z_{\alpha/2} \frac{\sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z_{\alpha/2}^2}{4n^2}}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} }.

Whew.

So, before I start such an incredibly long calculation, I’ll warn my students that this is going to take some time and we need to prepare… and I’ll start doing jumping jacks, shadow boxing, and other “exercise” in preparation for doing all of this writing.

My Favorite One-Liners: Part 52

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them. Today’s story is a continuation of yesterday’s post.

When I teach regression, I typically use this example to illustrate the regression effect:

Suppose that the heights of fathers and their adult sons both have mean 69 inches and standard deviation 3 inches. Suppose also that the correlation between the heights of the fathers and sons is 0.5. Predict the height of a son whose father is 63 inches tall. Repeat if the father is 78 inches tall.

Using the formula for the regression line

y = \overline{y} + r \displaystyle \frac{s_y}{s_x} (x - \overline{x}),

we obtain the equation

y = 69 + 0.5(x-69) = 0.5x + 34.5,

so that the predicted height of the son is 66 inches if the father is 63 inches tall. However, the prediction would be 73.5 inches if the father is 76 inches tall. As expected, tall fathers tend to have tall sons, and short fathers tend to have short sons. Then, I’ll tell my class:

However, to the psychological comfort of us short people, tall fathers tend to have sons who are not quite as tall, and short fathers tend to have sons who are not quite as short.

This was first observed by Francis Galton (see the Wikipedia article for more details), a particularly brilliant but aristocratic (read: snobbish) mathematician who had high hopes for breeding a race of super-tall people with the proper use of genetics, only to discover that the laws of statistics naturally prevented this from occurring. Defeated, he called this phenomenon “regression toward the mean,” and so we’re stuck with called fitting data to a straight line “regression” to this day.

My Favorite One-Liners: Part 51

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

When I teach regression, I typically use this example to illustrate the regression effect:

Suppose that the heights of fathers and their adult sons both have mean 69 inches and standard deviation 3 inches. Suppose also that the correlation between the heights of the fathers and sons is 0.5. Predict the height of a son whose father is 63 inches tall. Repeat if the father is 78 inches tall.

Using the formula for the regression line

y = \overline{y} + r \displaystyle \frac{s_y}{s_x} (x - \overline{x}),

we obtain the equation

y = 69 + 0.5(x-69) = 0.5x + 34.5,

so that the predicted height of the son is 66 inches if the father is 63 inches tall. However, the prediction would be 73.5 inches if the father is 76 inches tall.

To make this more memorable for students, I’ll observe:

As expected, tall fathers tend to have tall sons, and short fathers tend to have short sons. For example, my uncle was 6’6″. His two sons, my cousins, were 6’4″ and 6’5″ and were high school basketball stars.

My father was 5’3″. I became a math nerd.

My Favorite One-Liners: Part 36

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

Not everything in mathematics works out the way we’d prefer it to. For example, in statistics, a Type I error, whose probability is denoted by \alpha, is rejecting the null hypothesis even though the null hypothesis is true. Conversely, a Type II error, whose probability is denoted by \beta, is retaining the null hypothesis even though the null hypothesis is false.

Ideally, we’d like \alpha = 0 and \beta = 0, so there’s no chance of making a mistake. I’ll tell my students:

There are actually two places in the country where this can happen. One’s in California, and the other is in Florida. And that place is called Fantasyland.

My Favorite One-Liners: Part 33

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

Perhaps one of the more difficult things that I try to instill in my students is numeracy, or a sense of feeling if an answer to a calculation is plausible. As a initial step toward this goal, I’ll try to teach my students some basic pointers about whether an answer is even possible.

For example, when calculating a standard deviation, students have to compute E(X) and E(X^2):

E(X) = \sum x p(x) \qquad \hbox{or} \qquad E(X) = \int_a^b x f(x) \, dx

E(X^2) = \sum x^2 p(x) \qquad \hbox{or} \qquad E(X^2) = \int_a^b x^2 f(x) \, dx

After these are computed — which could take some time — the variance is then calculated:

\hbox{Var}(X) = E(X^2) - [E(X)]^2.

Finally, the standard deviation is found by taking the square root of the variance.

So, I’ll ask my students, what do you do if you calculate the variance and it’s negative, so that it’s impossible to take the square root? After a minute to students hemming and hawing, I’ll tell them emphatically what they should do:

It’s wrong… do it again.

The same principle applies when computing probabilities, which always have to be between 0 and 1. So, if ever a student computes a probability that’s either negative or else greater than 1, they can be assured that the answer is wrong and that there’s a mistake someplace in their computation that needs to be found.