Reminding students about Taylor series (Part 5)

Sadly, at least at my university, Taylor series is the topic that is least retained by students years after taking Calculus II. They can remember the rules for integration and differentiation, but their command of Taylor series seems to slip through the cracks. In my opinion, the reason for this lack of retention is completely understandable from a student’s perspective: Taylor series is usually the last topic covered in a semester, and so students learn them quickly for the final and quickly forget about them as soon as the final is over.

Of course, when I need to use Taylor series in an advanced course but my students have completely forgotten this prerequisite knowledge, I have to get them up to speed as soon as possible. Here’s the sequence that I use to accomplish this task. Covering this sequence usually takes me about 30 minutes of class time.

I should emphasize that I present this sequence in an inquiry-based format: I ask leading questions of my students so that the answers of my students are driving the lecture. In other words, I don’t ask my students to simply take dictation. It’s a little hard to describe a question-and-answer format in a blog, but I’ll attempt to do this below.

In the previous posts, I described how I lead students to the definition of the Maclaurin series

f(x) = \displaystyle \sum_{k=0}^{\infty} \frac{f^{(k)}(0)}{k!} x^k,

which converges to f(x) within some radius of convergence for all functions that commonly appear in the secondary mathematics curriculum.

green line

Step 5. That was easy; let’s try another one. Now let’s try f(x) = \displaystyle \frac{1}{1-x} = (1-x)^{-1}.

What’s f(0)? Plugging in, we find f(x) = \displaystyle \frac{1}{1-0} = 1.

Next, to find f'(0), we first find f'(x). Using the Chain Rule, we find f'(x) = -(1-x)^{-2} \cdot (-1) = \displaystyle \frac{1}{(1-x)^2}, so that f'(0) = 1.

Next, we differentiate again: f'(x) = (-2) \cdot (1-x)^{-3} \cdot (-1) = \displaystyle \frac{2}{(1-x)^3}, so that f''(0) = 2.

Hmmm… no obvious pattern yet… so let’s keep going.

For the next term, f'''(x) = (-3) \cdot 2(1-x)^{-4} \cdot (-1) = \displaystyle \frac{6}{(1-x)^4}, so that f'''(0) = 6.

For the next term, f^{(4)}(x) = (-4) \cdot 6(1-x)^{-5} \cdot (-1) = \displaystyle \frac{24}{(1-x)^5}, so that f^{(4)}(0) = 24.

Oohh… it’s the factorials again! It looks like f^{(n)}(0) = n!, and this can be formally proved by induction.

Plugging into the series, we find that

\displaystyle \frac{1}{1-x} = \sum_{n=0}^\infty \frac{n!}{n!} x^n = \sum_{n=0}^\infty x^n = 1 + x + x^2 + x^3 + \dots.

Like the series for e^x, this series converges quickest for x \approx 0. Unlike the series for e^x, this series does not converge for all real numbers. As can be checked with the Ratio Test, this series only converges if |x| < 1.

The right-hand side is a special kind of series typically discussed in precalculus. (Students often pause at this point, because most of them have forgotten this too.) It is an infinite geometric series whose first term is $1$ and common ratio $x$. So starting from the right-hand side, one can obtain the left-hand side using the formula

a + ar + ar^2 + ar^3 + \dots = \displaystyle \frac{a}{1-r}

by letting a=1 and $r=x$. Also, as stated in precalculus, this series only converges if the common ratio satisfies $|r| < 1$, as before.

In other words, in precalculus, we start with the geometric series and end with the function. With Taylor series, we start with the function and end with the series.

green line

Step 6. A whole bunch of other Taylor series can be quickly obtained from the one for \displaystyle \frac{1}{1-x}. Let’s take the derivative of both sides (and ignore the fact that one should prove that differentiating this infinite series term by term is permissible). Since

\displaystyle \frac{d}{dx} \left( \frac{1}{1-x} \right) = \frac{1}{(1-x)^2}

and

\displaystyle \frac{d}{dx} \left( 1 + x + x^2 + x^3 + x^4 + \dots \right) = 1 + 2x + 3x^2 + 4x^3 + \dots,

we have

\displaystyle \frac{1}{(1-x)^2} = 1 + 2x + 3x^2 + 4x^3 + \dots.

____________________

Next, let’s replace x with -x in the Taylor series in Step 5, obtaining

\displaystyle \frac{1}{1+x} = 1 - x + x^2 - x^3 + x^4 - x^5 \dots

Now let’s take the indefinite integral of both sides:

\displaystyle \int \frac{dx}{1+x} = \int \left( 1 - x + x^2 - x^3 + x^4 - x^5 \dots \right) \, dx

\ln(1+x) = \displaystyle x - \frac{x^2}{2} + \frac{x^3}{3} -\frac{ x^4}{4} + \frac{x^5}{5} -\frac{ x^6}{6} \dots + C

To solve for the constant of integration, let x = 0:

\ln(1) = 0+ C \Longrightarrow C = 0

Plugging back in, we conclude that

\ln(1+x) = x - \displaystyle \frac{x^2}{2} + \frac{x^3}{3} -\frac{ x^4}{4} + \frac{x^5}{5} -\frac{ x^6}{6} \dots

The Taylor series expansion for \ln(1-x) can be found by replacing x with -x:

\ln(1-x) = -x - \displaystyle \frac{x^2}{2} - \frac{x^3}{3} -\frac{ x^4}{4} - \frac{x^5}{5} -\frac{ x^6}{6} \dots

Subtracting, we find

\ln(1+x) - \ln(1-x) = \ln \displaystyle \left( \frac{1+x}{1-x} \right) = 2x + \frac{2x^3}{3}+ \frac{2x^5}{5} \dots

My understanding is that this latter series is used by calculators when computing logarithms.

____________________

Next, let’s replace x with -x^2 in the Taylor series in Step 5, obtaining

\displaystyle \frac{1}{1+x^2} = 1 - x^2 + x^4 - x^6 + x^8 - x^{10} \dots

Now let’s take the indefinite integral of both sides:

\displaystyle \int \frac{dx}{1+x^2} = \int \left(1 - x^2 + x^4 - x^6 + x^8 - x^{10} \dots\right) \, dx

\tan^{-1}x = \displaystyle x - \frac{x^3}{3} + \frac{x^5}{5} -\frac{ x^7}{7} + \frac{x^9}{9} -\frac{ x^{11}}{11} \dots + C

To solve for the constant of integration, let x = 0:

\tan^{-1}(1) = 0+ C \Longrightarrow C = 0

Plugging back in, we conclude that

\tan^{-1}x = \displaystyle x - \frac{x^3}{3} + \frac{x^5}{5} -\frac{ x^7}{7} + \frac{x^9}{9} -\frac{ x^{11}}{11} \dots

____________________

In summary, a whole bunch of Taylor series can be extracted quite quickly by differentiating and integrating from a simple infinite geometric series. I’m a firm believer in minimizing the number of formulas that I should memorize. Any time I personally need any of the above series, I’ll quickly use the above steps to derive them from that of \displaystyle \frac{1}{1-x}.

Reminding students about Taylor series (Part 4)

I’m in the middle of a series of posts describing how I remind students about Taylor series. In the previous posts, I described how I lead students to the definition of the Maclaurin series

f(x) = \displaystyle \sum_{k=0}^{\infty} \frac{f^{(k)}(0)}{k!} x^k,

which converges to f(x) within some radius of convergence for all functions that commonly appear in the secondary mathematics curriculum.

green line

Step 4. Let’s now get some practice with Maclaurin series. Let’s start with f(x) = e^x.

What’s f(0)? That’s easy: f(0) = e^0 = 1.

Next, to find f'(0), we first find f'(x). What is it? Well, that’s also easy: f'(x) = \frac{d}{dx} (e^x) = e^x. So f'(0) is also equal to 1.

How about f''(0)? Yep, it’s also 1. In fact, it’s clear that f^{(n)}(0) = 1 for all n, though we’ll skip the formal proof by induction.

Plugging into the above formula, we find that

e^x = \displaystyle \sum_{k=0}^{\infty} \frac{1}{k!} x^k = \sum_{k=0}^{\infty} \frac{x^k}{k!} = 1 + x + \frac{x^2}{2} + \frac{x^3}{3} + \dots

It turns out that the radius of convergence for this power series is \infty. In other words, the series on the right converges for all values of x. So we’ll skip this for review purposes, this can be formally checked by using the Ratio Test.

green line

At this point, students generally feel confident about the mechanics of finding a Taylor series expansion, and that’s a good thing. However, in my experience, their command of Taylor series is still somewhat artificial. They can go through the motions of taking derivatives and finding the Taylor series, but this complicated symbol in \displaystyle \sum notation still doesn’t have much meaning.

So I shift gears somewhat to discuss the rate of convergence. My hope is to deepen students’ knowledge by getting them to believe that f(x) really can be approximated to high precision with only a few terms. Perhaps not surprisingly, it converges quicker for small values of x than for big values of x.

Pedagogically, I like to use a spreadsheet like Microsoft Excel to demonstrate the rate of convergence. A calculator could be used, but students can see quickly with Excel how quickly (or slowly) the terms get smaller. I usually construct the spreadsheet in class on the fly (the fill down feature is really helpful for doing this quickly), with the end product looking something like this:

Taylor0

In this way, students can immediately see that the Taylor series is accurate to four significant digits by going up to the x^4 term and that about ten or eleven terms are needed to get a figure that is as accurate as the precision of the computer will allow. In other words, for all practical purposes, an infinite number of terms are not necessary.

In short, this is how a calculator computes e^x: adding up the first few terms of a Taylor series. Back in high school, when students hit the e^x button on their calculators, they’ve trusted the result but the mechanics of how the calculator gets the result was shrouded in mystery. No longer.

Then I shift gears by trying a larger value of x:

Taylor1

I ask my students the obvious question: What went wrong? They’re usually able to volunteer a few ideas:

  • The convergence is slower for larger values of x.
  • The series will converge, but more terms are needed (and I’ll later use the fill down feature to get enough terms so that it does converge as accurate as double precision will allow).
  • The individual terms get bigger until k=11 and then start getting smaller. I’ll ask my students why this happens, and I’ll eventually get an explanation like

\displaystyle \frac{(11.5)^6}{6!} < \frac{(11.5)^6}{6!} \times \frac{11.5}{7} = \frac{(11.5)^7}{7!}

but

\displaystyle \frac{(11.5)^{11}}{11!} < \frac{(11.5)^{11}}{11!} \times \frac{11.5}{12} = \frac{(11.5)^{12}}{12!}

At this point, I’ll mention that calculators use some tricks to speed up convergence. For example, the calculator can simply store a few values of e^x in memory, like e^{16}, e^{8}, e^{4}, e^{2}, and e^{1} = e. I then ask my class how these could be used to find e^{11.5}. After some thought, they will volunteer that

e^{11.5} = e^8 \cdot e^2 \cdot e \cdot e^{0.5}.

The first three values don’t need to be computed — they’ve already been stored in memory — while the last value can be computed via Taylor series. Also, since 0.5 < 1, the series for e^{0.5} will converge pretty quickly. (Some students may volunteer that the above product is logically equivalent to turning 11 into binary.)

At this point — after doing these explicit numerical examples — I’ll show graphs of e^x and graphs of the Taylor polynomials of e^x, observing that the polynomials get closer and closer to the graph of e^x as more terms are added. (For example, see the graphs on the Wikipedia page for Taylor series, though I prefer to use Mathematica for in-class purposes.) In my opinion, the convergence of the graphs only becomes meaningful to students only after doing some numerical examples, as done above.

green line

At this point, I hope my students are familiar with the definition of Taylor (Maclaurin) series, can apply the definition to e^x, and have some intuition meaning that the nasty Taylor series expression practically means add a bunch of terms together until you’re satisfied with the convergence.

In the next post, we’ll consider another Taylor series which ought to be (but usually isn’t) really familiar to students: an infinite geometric series.

P.S. Here’s the Excel spreadsheet that I used to make the above figures: Taylor.

Reminding students about Taylor series (Part 3)

Sadly, at least at my university, Taylor series is the topic that is least retained by students years after taking Calculus II. They can remember the rules for integration and differentiation, but their command of Taylor series seems to slip through the cracks. In my opinion, the reason for this lack of retention is completely understandable from a student’s perspective: Taylor series is usually the last topic covered in a semester, and so students learn them quickly for the final and quickly forget about them as soon as the final is over.

Of course, when I need to use Taylor series in an advanced course but my students have completely forgotten this prerequisite knowledge, I have to get them up to speed as soon as possible. Here’s the sequence that I use to accomplish this task. Covering this sequence usually takes me about 30 minutes of class time.

I should emphasize that I present this sequence in an inquiry-based format: I ask leading questions of my students so that the answers of my students are driving the lecture. In other words, I don’t ask my students to simply take dictation. It’s a little hard to describe a question-and-answer format in a blog, but I’ll attempt to do this below.

In the previous post, I described how I lead students to the equations

f(x) = \displaystyle \sum_{k=0}^n \frac{f^{(k)}(0)}{k!} x^k.

and

f(x) = \displaystyle \sum_{k=0}^n \frac{f^{(k)}(a)}{k!} (x-a)^k,

where $f(x)$ is a polynomial and a can be any number.

green line

Step 3. What happens if the original function f(x) is not a polynomial? For one thing, the right-hand side can no longer be a finite sum. As long as the sum on the right-hand side stops at some degree n, the right-hand side is a polynomial, but the left-hand side is assumed to not be a polynomial.

To resolve this, we can cross our fingers and hope that

f(x) = \displaystyle \sum_{k=0}^{\infty} \frac{f^{(k)}(0)}{k!} x^k,

or

f(x) = \displaystyle \sum_{k=0}^{\infty}\frac{f^{(k)}(a)}{k!} (x-a)^k.

In other words, let’s make the right-hand side an infinite series, and hope for the best. This is the definition of the Taylor series expansions of f.

Note: At this point in the review, I can usually see the light go on in my students’ eyes. Usually, they can now recall their work with Taylor series in the past… and they wonder why they weren’t taught this topic inductively (like I’ve tried to do in the above exposition) instead of deductively (like the presentation in most textbooks).

While we’d like to think that the Taylor series expansions always work, there are at least two things that can go wrong.

  1. First, the sum on the left is an infinite series, and there’s no guarantee that the series will converge in the first place. There are plenty of example of series that diverge, like \displaystyle \sum_{k=0}^\infty \frac{1}{k+1}.
  2. Second, even if the series converges, there’s no guarantee that the series will converge to the “right” answer f(x). The canonical example of this behavior is f(x) = e^{-1/x^2}, which is so “flat” near $x=0$ that every single derivative of f is equal to 0 at x =0.

For the first complication, there are multiple tests devised in Calculus II, especially the Ratio Test, to determine the values of x for which the series converges. This establishes a radius of convergence for the series.

The second complication is far more difficult to address rigorously. The good news is that, for all commonly occurring functions in the secondary mathematics curriculum, the Taylor series of a function properly converges (when it does converge). So we will happily ignore this complication for the remainder of the presentation.

Indeed, it’s remarkable that the series should converge to f(x) at all. Think about the meaning of the terms on the right-hand side:

  1. f(a) is the y-coordinate at x=a.
  2. f'(a) is the slope of the curve at x=a.
  3. f''(a) is a measure of the concavity of the curve at — you guessed it — x=a.
  4. f'''(a) is an even more subtle description of the curve… once again, at x=a.

In other words, if the Taylor series converges to f(x), then every twist and turn of the function, even at points far away from x=a, is encoded somehow in the shape of the curve at the one point x=a. So analytic functions (which has a Taylor series which converges to the original functions) are indeed quite remarkable.

 

Reminding students about Taylor series (Part 2)

In this series of posts, I will describe the sequence of examples that I use to remind students about Taylor series. (One time, just for fun, I presented this topic at the end of a semester of Calculus I, and it seemed to go well even for that audience who had not seen Taylor series previously.)

I should emphasize that I present this sequence inductively and in an inquiry-based format: I ask leading questions of my students so that the answers of my students are driving the lecture. In other words, I don’t ask my students to simply take dictation. It’s a little hard to describe a question-and-answer format in a blog, but I’ll attempt to do this below.

green line

Step 1. Find the unique quartic (fourth-degree) polynomial so that f(0) = 6, f'(0) = -3, f''(0) = 6, f'''(0) = 2, and f^{(4)}(0) = 10.

I’ve placed a thought bubble if you’d like to think about it before scrolling down to see the answer. Here’s a hint to get started: let f(x) = ax^4 + bx^3 + cx^2 + dx + e, and start differentiating. Remember that a, b, c, d, and e are constants.

green_speech_bubble

We begin with the information that f(0) = 6. How else can we find $f(0)$? Since f(x) = ax^4 + bx^3 + cx^2 + dx + e, we see that f(0) = e. Therefore, it must be that e = 6.

How about f'(0)? We see that f'(x) = 4ax^3 + 3bx^2 + 2cx + d, and so f'(0) = d. Since f'(0) = -3, we have that d = -3.

Next, f''(x) = 12ax^2 + 6bx + 2c, and so f''(0) = 2c. Since f''(0) = 6,we have that 2c = 6, or c = 3.

Next, f'''(x) = 24ax + 6b, and so f'''(0) = 6b. Since f'''(0) = 2,we have that 6b = 2, or b = \frac{1}{3}.

Finally, f^{(4)}(x) = 24a, and so f^{(4)}(0) = 24a. Since f^{(4)}(0) = 10, we have 24a = 10, or a = \frac{5}{12}.

What do we get when we put all of this information together? The polynomial must be

f(x) = \frac{5}{12} x^4 + \frac{1}{3} x^3 + 3 x^2 - 3x + 6.

green line

Step 2. How are these coefficients related to the information given in the problem?

green_speech_bubbleLet’s start with the leading coefficient, a = \frac{5}{12}. How did we get this answer? It came from dividing 10 by 24. Where did the 10 come from? It was the given value of f^{(4)}(0), and so

a = \displaystyle \frac{f^{(4)}(0)}{24}.

Next, b = \frac{1}{3}, which arose from dividing 2 by 6. The number 2 was the given value of f'''(0), and so

b =\displaystyle \frac{f'''(0)}{6}.

Moving to the next coefficient, c = 3, which arose from dividing f''(0) = 6 by 2. So

c = \displaystyle\frac{f''(0)}{2}.

Finally, it’s clear that

d = f'(0) and e = f(0).

This last line doesn’t quite fit the pattern of the first three lines. The first three lines all have fractions, but these last two expressions don’t. How can we fix this? In the hopes of finding a pattern, let’s (unnecessarily) write d and e as fractions by dividing by 1:

d = \displaystyle\frac{f'(0)}{1} and e = \displaystyle \frac{f(0)}{1}.

Let’s now rewrite the polynomial f(x) in light of this discussion:

f(x) = \displaystyle \frac{f'^{(4)}(0)}{24} x^4 + \frac{f'''(0)}{6} x^3 + \frac{f'''(0)}{2} x^2 + \frac{f'(0)}{1}x + \frac{f(0)}{1}.

What pattern do we see in the numerators? It’s apparent that the number of derivatives matches the power of x. For example, the x^3 term has a coefficient involving the third derivative of f. The last two terms fit this pattern as well, since x = x^1 and the last term is multiplied by x^0 = 1.

What pattern do we see in the denominators? 1, 1, 2, 6, 24 \dots where have we seen those before? Oh yes, the factorials! We know that 4! = 4 \cdot 3 \cdot 2 \cdot 1 = 24, 3! = 3 \cdot 2 \cdot 1 = 6, 2! = 2 \cdot 1 = 2, 1! = 1, and 0! is defined to be 1. So f(x) can be rewritten as

f(x) = \displaystyle \frac{f'^{(4)}(0)}{4!} x^4 + \frac{f'''(0)}{3!} x^3 + \frac{f'''(0)}{2!} x^2 + \frac{f'(0)}{1!}x + \frac{f(0)}{0!}.

How can this be written more compactly? By using \displaystyle \sum-notation:

f(x) = \displaystyle \sum_{k=0}^4 \frac{f^{(k)}(0)}{k!} x^k.

Why does the sum stop at 4? Because the original polynomial had degree 4. In general, if the polynomial had degree n, it’s reasonable to guess that

f(x) = \displaystyle \sum_{k=0}^n \frac{f^{(k)}(0)}{k!} x^k.

This is called the Maclaurian series, or the Taylor series about x =0. While I won’t prove it here, one can find Taylor series expansions about points other than 0:

f(x) = \displaystyle \sum_{k=0}^n \frac{f^{(k)}(a)}{k!} (x-a)^k,

where a can be any number. Though not proven here, these series are exactly true for polynomials.

In the next post, we’ll discuss what happens if f(x) is not a polynomial.

Reminding students about Taylor series (Part 1)

At my university, Calculus II covers approximately the same topics covered in an AP Calculus BC course: integrals and derivatives with logarithms and exponential functions, various techniques of integration (including integration by parts and trigonometric substitutions), and convergence of infinite series.

In my opinion, the single most important of these topics is Taylor series (or, if you prefer, Maclaurin series), as these approximations to transcendental functions like e^x and \sin x are used over and over again in higher mathematics.

\bullet A good working knowledge of Taylor series is necessary for computing series solutions of ordinary differential equations.

\bullet In physics, elementary approximations like \sin x \approx x are used over and over again. For example, the governing differential equation for the motion of oscillating pendulums is

\displaystyle \frac{d^2 \theta}{dt^2} + \frac{g}{\ell} \sin \theta = 0,

where g is the acceleration due to gravity and \ell is the length of the pendulum. This differential equation cannot be solved exactly, and its solution is very complex.

However, for small angles, we may use the approximation \sin \theta \approx \theta, so that the differential equation becomes

\displaystyle \frac{d^2 \theta}{dt^2} + \frac{g}{\ell} \theta = 0,

By eliminating the \sin \theta term, we now have a second-order differential equation with constant coefficients, which can be solved in a straightforward manner using standard techniques from differential equations. If \theta(0) = \theta_0 and \theta'(0) = 0 (i.e., the pendulum is pulled a small angle \theta_0 and is then released), the solution is

\theta(t) = \theta_0 \cos\left(t \sqrt{\displaystyle \frac{g}{\ell}} \right).

In other words, the pendulum exhibits sinusoidal behavior. (FYI, for an amazing display of kinetic art, see this demonstration of pendulum waves.)

\bullet The primary way that students interface with Taylor series is through their calculators. When a calculator computes \cos 1000^o, it doesn’t draw a unit circle, trace out an angle of 1000^o in standard position, and find the x-coordinate of the terminal point. Instead, the calculator converts 1000^o into radians and adds the first few terms of the Taylor series expansion for \cos x.

The calculator may use a few tricks to accelerate convergence. For this example, using some trigonometric identities, \cos 1000^o= \cos 280^o= \cos 80^o= \sin 10^o, and (as I’ll discuss) the Maclaurin series for \sin x at x = 10^o converges much faster than the Maclaurin series for \cos x at x = 1000^o.

green line

I’ve argued the importance of Taylor series in higher-level courses in both mathematics and physics. Sadly, at least at my university, Taylor series is probably the topic that is least retained by students years after taking Calculus II. They can remember the rules for integration and differentiation, but their command of Taylor series seems to slip through the cracks.

In my opinion, the reason for this lack of retention is completely understandable from a student’s perspective: Taylor series is usually the last topic covered in a semester, and so students learn them quickly for the final and quickly forget about them as soon as the final is over.

Of course, when I need to use Taylor series in an advanced course but my students have completely forgotten this prerequisite knowledge, I have to get them up to speed as soon as possible. Over the next few posts, I will present the sequence of examples that I use to accomplish this task. Covering this sequence usually takes me about 30-40 minutes of class time, depending on the class.

I should emphasize that, as much as possible, I present this sequence inductively and in an inquiry-based format: I ask leading questions of my students so that the answers of my students are driving the lecture. In other words, I don’t ask my students to simply take dictation. It’s a little hard to describe a question-and-answer format in a blog, but I’ll attempt to do this below.

Beginning with the next post, I’ll describe this sequence.

Welch’s formula

When conducting an hypothesis test or computing a confidence interval for the difference \overline{X}_1 - \overline{X}_2 of two means, where at least one mean does not arise from a small sample, the Student t distribution must be employed. In particular, the number of degrees of freedom for the Student t distribution must be computed. Many textbooks suggest using Welch’s formula:

df = \frac{\displaystyle (SE_1^2 + SE_2^2)^2}{\displaystyle \frac{SE_1^4}{n_1-1} + \frac{SE_2^4}{n_2-1}},

rounded down to the nearest integer. In this formula, SE_1 = \displaystyle \frac{\sigma_1}{\sqrt{n_1}} is the standard error associated with the first average \overline{X}_1, where \sigma_1 (if known) is the population standard deviation for X and n_1 is the number of samples that are averaged to find \overline{X}_1. In practice, \sigma_1 is not known, and so the bootstrap estimate \sigma_1 \approx s_1 is employed.

The terms SE_2 and n_2 are similarly defined for the average \overline{X}_2.

In Welch’s formula, the term SE_1^2 + SE_2^2 in the numerator is equal to \displaystyle \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}. This is the square of the standard error SE_D associated with the difference \overline{X}_1 - \overline{X}_2, since

SE_D = \displaystyle \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}.

This leads to the “Pythagorean” relationship

SE_1^2 + SE_2^2 = SE_D^2,

which (in my experience) is a reasonable aid to help students remember the formula for SE_D.

green line

Naturally, a big problem that students encounter when using Welch’s formula is that the formula is really, really complicated, and it’s easy to make a mistake when entering information into their calculators. (Indeed, it might be that the pre-programmed calculator function simply gives the wrong answer.) Also, since the formula is complicated, students don’t have a lot of psychological reassurance that, when they come out the other end, their answer is actually correct. So, when teaching this topic, I tell my students the following rule of thumb so that they can at least check if their final answer is plausible:

\min(n_1,n_2)-1 \le df \le n_1 + n_2 -2.

To my surprise, I have never seen this formula in a statistics textbook, even though it’s quite simple to state and not too difficult to prove using techniques from first-semester calculus.

Let’s rewrite Welch’s formula as

df = \left( \displaystyle \frac{1}{n_1-1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{n_2-1} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}

For the sake of simplicity, let m_1 = n_1 - 1 and m_2 = n_2 -1, so that

df = \left( \displaystyle \frac{1}{m_1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{m_2} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}

Now let x = \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2}. All of these terms are nonnegative (and, in practice, they’re all positive), so that x \ge 0. Also, the numerator is no larger than the denominator, so that x \le 1. Finally, we notice that

1-x = 1 - \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2} = \frac{SE_2^2}{SE_1^2 + SE_2^2}.

Using these observations, Welch’s formula reduces to the function

f(x) = \left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-1},

and the central problem is to find the maximum and minimum values of f(x) on the interval 0 \le x \le 1. Since f(x) is differentiable on [0,1], the absolute extrema can be found by checking the endpoints and the critical point(s).

First, the endpoints. If x=0, then f(0) = \left( \displaystyle \frac{1}{m_2} \right)^{-1} = m_2. On the other hand, if x=1, then f(1) = \left( \displaystyle \frac{1}{m_1} \right)^{-1} = m_1.

Next, the critical point(s). These are found by solving the equation f'(x) = 0:

f'(x) = -\left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-2} \left[ \displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} \right] = 0

\displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} = 0

\displaystyle \frac{2x}{m_1} = \frac{2(1-x)}{m_2}

xm_2= (1-x)m_1

xm_2 = m_1 - xm_1

x(m_1 + m_2) = m_1

x = \displaystyle \frac{m_1}{m_1 + m_2}

Plugging back into the original equation, we find the local extremum

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[1-\frac{m_1}{m_1+m_2}\right]^2 \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[\frac{m_2}{m_1+m_2}\right]^2 \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1}{(m_1+m_2)^2} + \frac{m_2}{(m_1+m_2)^2} \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1+m_2}{(m_1+m_2)^2} \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1+m_2} \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = m_1+m_2

Based on the three local extrema that we’ve found, it’s clear that the absolute minimum of f(x) on [0,1] is the smaller of m_1 and m_2, while the absolute maximum is equal to m_1 + m_2.

\hbox{QED}

In conclusion, I suggest offering the following guidelines to students to encourage their intuition about the plausibility of their answers:

  • If SE_1 is much smaller than SE_2 (i.e., x \approx 0), then df will be close to m_2 = n_2 - 1.
  • If SE_1 is much larger than SE_2 (i.e., x \approx 1), then df will be close to m_1 = n_1 - 1.
  • Otherwise, df could be as large as m_1 + m_2 = n_1 + n_2 - 2, but no larger.

Measuring terminal velocity

Using a simultaneously falling softball as a stopwatch, the terminal velocity of a whiffle ball can be obtained to surprisingly high accuracy with only common household equipment. In the January 2013 issue of College Mathematics Monthly, we describe an classroom activity that engages students in this apparently daunting task that nevertheless is tractable, using a simple model and mathematical techniques at their disposal.

Epsilon

Years ago, when I taught calculus, I’d usually include the following extra credit question on the first exam: “In the small box, write a good value for \varepsilon. A valid answer gets 4 points; the smallest answer in the class will get 5 points.” It was basically free extra credit… any positive number would work, but it was a (hopefully) fun way for students to be a little competitive in coming up with small positive numbers, which is the intuitive meaning of \varepsilon in mathematics. (I still remember when my high school math teacher was giving me directions to a restaurant, concluding “You’ll know you’re within \varepsilon of the restaurant when you see the signs for Such-and-Such Mall.”)

Most students volunteered something like 0.0000001 or 10^{-9999999999999999}. Except for one particularly gutsy student who wrote, “The probability that Dr. Q gets a date on Friday night.” For sheer nerve, he got the 5 points that year.

Also getting 5 points that year was the best answer of the class: “Let x be the smallest answer that anyone else wrote. Then \varepsilon = x/2.” That was especially clever from a calculus student, as that’s the essence of a fairly common technique when writing proofs in real analysis.