Finding the Regression Line without Calculus

Last month, my latest professional article, Deriving the Regression Line with Algebra, was published in the April 2017 issue of Mathematics Teacher (Vol. 110, Issue 8, pages 594-598). Although linear regression is commonly taught in high school algebra, the usual derivation of the regression line requires multidimensional calculus. Accordingly, algebra students are typically taught the keystrokes for finding the line of best fit on a graphing calculator with little conceptual understanding of how the line can be found.

In my article, I present an alternative way that talented Algebra II students (or, in principle, Algebra I students) can derive the line of best fit for themselves using only techniques that they already know (in particular, without calculus).

For copyright reasons, I’m not allowed to provide the full text of my article here, though subscribers to Mathematics Teacher should be able to read the article by clicking the above link. (I imagine that my article can also be obtained via inter-library loan from a local library.) That said, I am allowed to share a macro-enabled Microsoft Excel spreadsheet that I wrote that allows students to experimentally discover the line of best fit:

http://www.math.unt.edu/~johnq/ExploringTheLineofBestFit.xlsm

I created this spreadsheet so that students can explore (which is, after all, the first E of the 5-E model) the properties of the line of best fit. In this spreadsheet, students can enter a data set with up to 10 points and then experiment with different slopes and y-intercepts. As they experiment, the spreadsheet keeps track of the current sum of the squares of the residuals as well as the best guess attempted so far. After some experimentation, the spreadsheet can also provide the correct answer so that students can see how close they got to the right answer.

My Favorite One-Liners: Part 95

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

Today’s quip is one that I’ll use in a statistics class when we find an extraordinarily small P-value. For example:

There is a social theory that states that people tend to postpone their deaths until after some meaningful event… birthdays, anniversaries, the World Series.

In 1978, social scientists investigated obituaries that appeared in a Salt Lake City newspaper. Among the 747 obituaries examined, 60 of the deaths occurred in the three-month period preceding their birth month. However, if the day of death is independent of birthday, we would expect that 25% of these deaths would occur in this three-month period.

Does this study provide statistically significant evidence to support this theory? Use \alpha=0.01.

It turns out, using a one-tailed hypothesis test for proportions, that the test statistics is z = -10.71 and the P-value is about 4.5 \times 10^{-27}. After the computations, I’ll then discuss what the numbers mean.

I’ll begin by asking, “Is the null hypothesis [that the proportion of deaths really is 25%] possible?” The correct answer is, “Yes, it’s possible.” Even extraordinarily small P-values do not prove that the null hypothesis is impossible. To emphasize the point, I’ll say:

After all, I found a woman who agreed to marry me. So extremely unlikely events are still possible.

Once the laughter dies down, I’ll ask the second question, “Is the null hypothesis plausible?” Of course, the answer is no, and so we reject the null hypothesis in favor of the alternative.

 

My Favorite One-Liners: Part 71

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

Some of the algorithms that I teach are pretty lengthy. For example, consider the calculation of a 100(1-\alpha)\% confidence interval for a proportion:

\displaystyle \frac{\hat{p} + \displaystyle \frac{z_{\alpha/2}^2}{2n}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } - z_{\alpha/2} \frac{\sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z_{\alpha/2}^2}{4n^2}}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } < p < \displaystyle \frac{\hat{p} + \displaystyle \frac{z_{\alpha/2}^2}{2n}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } + z_{\alpha/2} \frac{\sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z_{\alpha/2}^2}{4n^2}}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} }.

Wow.

Proficiency with this formula definitely requires practice, and so I’ll typically give a couple of practice problems so that my students can practice using this formula while in class. After the last example, when I think that my students have the hang of this very long calculation, I’ll give my one-liner to hopefully boost their confidence (no pun intended):

By now, you probably think that this calculation is dull, uninteresting, repetitive, and boring. If so, then I’ve done my job right.

My Favorite One-Liners: Part 65

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

I’ll use today’s one-liner just before I begin some enormous, complicated, and tedious calculation that’s going to take more than a few minutes to complete. To give a specific example of such a calculation: consider the derivation of the Agresti confidence interval for proportions. According to the central limit theorem, if n is large enough, then

Z = \displaystyle \frac{ \hat{p} - p}{ \displaystyle \sqrt{ \frac{p(1-p) }{n} } }

is approximately normally distributed, where p is the true population proportion and \hat{p} is the sample proportion from a sample of size n. By unwrapping this equation and solving for p, we obtain the formula for the confidence interval for a proportion:

z \displaystyle \sqrt{\frac{p(1-p)}{n} } = \hat{p} - p

\displaystyle \frac{z^2 p(1-p)}{n} = \left( \hat{p} - p \right)^2

z^2p - z^2 p^2 = n \hat{p}^2 - 2 n \hat{p} p + n p^2

0 = p^2 (z^2 + n) - p (2n \hat{p} + z^2) + n \hat{p}^2

We now use the quadratic formula to solve for p:

p = \displaystyle \frac{2n \hat{p} + z^2 \pm \sqrt{ \left(2n\hat{p} + z^2 \right)^2 - 4n\hat{p}^2 (z^2+n)}}{2(z^2+n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm \sqrt{4n^2 \hat{p}^2 + 4n \hat{p} z^2 + z^4 - 4n\hat{p}^2 z^2 - 4n^2 \hat{p}^2}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm \sqrt{4n (\hat{p}-\hat{p}^2) z^2 + z^4}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm \sqrt{4n \hat{p}(1-\hat{p}) z^2 + z^4}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm \sqrt{4n \hat{p} \hat{q} z^2 + z^4}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm z \sqrt{4n \hat{p} \hat{q} + z^2}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm z \sqrt{4n^2 \displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle 4n^2 \frac{z^2}{4n^2}}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + z^2 \pm 2nz \sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z^2}{4n^2}}}{2(z^2 + n)}

p = \displaystyle \frac{2n \hat{p} + 2n \displaystyle \frac{z^2}{2n} \pm 2nz \sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} +\displaystyle \frac{z^2}{4n^2}}}{2n \displaystyle \left(1 + \frac{z^2}{n} \right)}

p = \displaystyle \frac{\hat{p} + \displaystyle \frac{z^2}{2n} \pm z \sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z^2}{4n^2}}}{\displaystyle 1 + \frac{z^2}{n} }

From this we finally obtain the 100(1-\alpha)\% confidence interval

\displaystyle \frac{\hat{p} + \displaystyle \frac{z_{\alpha/2}^2}{2n}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } - z_{\alpha/2} \frac{\sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z_{\alpha/2}^2}{4n^2}}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } < p < \displaystyle \frac{\hat{p} + \displaystyle \frac{z_{\alpha/2}^2}{2n}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} } + z_{\alpha/2} \frac{\sqrt{\displaystyle \frac{ \hat{p} \hat{q}}{n} + \displaystyle \frac{z_{\alpha/2}^2}{4n^2}}}{\displaystyle 1 + \frac{z_{\alpha/2}^2}{n} }.

Whew.

So, before I start such an incredibly long calculation, I’ll warn my students that this is going to take some time and we need to prepare… and I’ll start doing jumping jacks, shadow boxing, and other “exercise” in preparation for doing all of this writing.

My Favorite One-Liners: Part 52

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them. Today’s story is a continuation of yesterday’s post.

When I teach regression, I typically use this example to illustrate the regression effect:

Suppose that the heights of fathers and their adult sons both have mean 69 inches and standard deviation 3 inches. Suppose also that the correlation between the heights of the fathers and sons is 0.5. Predict the height of a son whose father is 63 inches tall. Repeat if the father is 78 inches tall.

Using the formula for the regression line

y = \overline{y} + r \displaystyle \frac{s_y}{s_x} (x - \overline{x}),

we obtain the equation

y = 69 + 0.5(x-69) = 0.5x + 34.5,

so that the predicted height of the son is 66 inches if the father is 63 inches tall. However, the prediction would be 73.5 inches if the father is 76 inches tall. As expected, tall fathers tend to have tall sons, and short fathers tend to have short sons. Then, I’ll tell my class:

However, to the psychological comfort of us short people, tall fathers tend to have sons who are not quite as tall, and short fathers tend to have sons who are not quite as short.

This was first observed by Francis Galton (see the Wikipedia article for more details), a particularly brilliant but aristocratic (read: snobbish) mathematician who had high hopes for breeding a race of super-tall people with the proper use of genetics, only to discover that the laws of statistics naturally prevented this from occurring. Defeated, he called this phenomenon “regression toward the mean,” and so we’re stuck with called fitting data to a straight line “regression” to this day.

My Favorite One-Liners: Part 51

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

When I teach regression, I typically use this example to illustrate the regression effect:

Suppose that the heights of fathers and their adult sons both have mean 69 inches and standard deviation 3 inches. Suppose also that the correlation between the heights of the fathers and sons is 0.5. Predict the height of a son whose father is 63 inches tall. Repeat if the father is 78 inches tall.

Using the formula for the regression line

y = \overline{y} + r \displaystyle \frac{s_y}{s_x} (x - \overline{x}),

we obtain the equation

y = 69 + 0.5(x-69) = 0.5x + 34.5,

so that the predicted height of the son is 66 inches if the father is 63 inches tall. However, the prediction would be 73.5 inches if the father is 76 inches tall.

To make this more memorable for students, I’ll observe:

As expected, tall fathers tend to have tall sons, and short fathers tend to have short sons. For example, my uncle was 6’6″. His two sons, my cousins, were 6’4″ and 6’5″ and were high school basketball stars.

My father was 5’3″. I became a math nerd.

My Favorite One-Liners: Part 36

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

Not everything in mathematics works out the way we’d prefer it to. For example, in statistics, a Type I error, whose probability is denoted by \alpha, is rejecting the null hypothesis even though the null hypothesis is true. Conversely, a Type II error, whose probability is denoted by \beta, is retaining the null hypothesis even though the null hypothesis is false.

Ideally, we’d like \alpha = 0 and \beta = 0, so there’s no chance of making a mistake. I’ll tell my students:

There are actually two places in the country where this can happen. One’s in California, and the other is in Florida. And that place is called Fantasyland.

My Favorite One-Liners: Part 33

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

Perhaps one of the more difficult things that I try to instill in my students is numeracy, or a sense of feeling if an answer to a calculation is plausible. As a initial step toward this goal, I’ll try to teach my students some basic pointers about whether an answer is even possible.

For example, when calculating a standard deviation, students have to compute E(X) and E(X^2):

E(X) = \sum x p(x) \qquad \hbox{or} \qquad E(X) = \int_a^b x f(x) \, dx

E(X^2) = \sum x^2 p(x) \qquad \hbox{or} \qquad E(X^2) = \int_a^b x^2 f(x) \, dx

After these are computed — which could take some time — the variance is then calculated:

\hbox{Var}(X) = E(X^2) - [E(X)]^2.

Finally, the standard deviation is found by taking the square root of the variance.

So, I’ll ask my students, what do you do if you calculate the variance and it’s negative, so that it’s impossible to take the square root? After a minute to students hemming and hawing, I’ll tell them emphatically what they should do:

It’s wrong… do it again.

The same principle applies when computing probabilities, which always have to be between 0 and 1. So, if ever a student computes a probability that’s either negative or else greater than 1, they can be assured that the answer is wrong and that there’s a mistake someplace in their computation that needs to be found.

My Favorite One-Liners: Part 23

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

Here are some sage words of wisdom that I give in my statistics class:

If the alternative hypothesis has the form p > p_0, then the rejection region lies to the right of p_0. On the other hand, if the alternative hypothesis has the form p < p_0, then the rejection region lies to the left of p_0.

On the other hand, if the alternative hypothesis has the form p \ne p_0, then the rejection region has two parts: one part to the left of p_0, and another part to the right. So it’s kind of like my single days. Back then, my rejection region had two parts: Friday night and Saturday night.

My Favorite One-Liners: Part 22

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them. Today’s example might be the most cringe-worthy pun that I use in any class that I teach.

In my statistics classes, I try to emphasize to student that a high value of the correlation coefficient r is not the same thing as causation. To hopefully drive home this point, I’ll use the following picture.

piracy01

Conclusion: If we want to stop global warming, we should all become pirates.

Obviously, I tell my class, there isn’t a cause-and-effect relationship here, even though there is a strong positive correlation. So, I tell my class, in my best pirate voice, “Correlation is not the same thing as a causation, even if you get a large value of ARRRRRRR.”

Without fail, my students love this awful wisecrack.

While I’m on the topic, this is too good not to share:

For further reading, see my series on correlation and causation.