Reminding students about Taylor series (Part 5)

Sadly, at least at my university, Taylor series is the topic that is least retained by students years after taking Calculus II. They can remember the rules for integration and differentiation, but their command of Taylor series seems to slip through the cracks. In my opinion, the reason for this lack of retention is completely understandable from a student’s perspective: Taylor series is usually the last topic covered in a semester, and so students learn them quickly for the final and quickly forget about them as soon as the final is over.

Of course, when I need to use Taylor series in an advanced course but my students have completely forgotten this prerequisite knowledge, I have to get them up to speed as soon as possible. Here’s the sequence that I use to accomplish this task. Covering this sequence usually takes me about 30 minutes of class time.

I should emphasize that I present this sequence in an inquiry-based format: I ask leading questions of my students so that the answers of my students are driving the lecture. In other words, I don’t ask my students to simply take dictation. It’s a little hard to describe a question-and-answer format in a blog, but I’ll attempt to do this below.

In the previous posts, I described how I lead students to the definition of the Maclaurin series

$f(x) = \displaystyle \sum_{k=0}^{\infty} \frac{f^{(k)}(0)}{k!} x^k$ ,

which converges to $f(x)$ within some radius of convergence for all functions that commonly appear in the secondary mathematics curriculum.

Step 5. That was easy; let’s try another one. Now let’s try $f(x) = \displaystyle \frac{1}{1-x} = (1-x)^{-1}$ .

What’s $f(0)$ ? Plugging in, we find $f(x) = \displaystyle \frac{1}{1-0} = 1$ .

Next, to find $f'(0)$ , we first find $f'(x)$ . Using the Chain Rule, we find $f'(x) = -(1-x)^{-2} \cdot (-1) = \displaystyle \frac{1}{(1-x)^2}$ , so that $f'(0) = 1$ .

Next, we differentiate again: $f'(x) = (-2) \cdot (1-x)^{-3} \cdot (-1) = \displaystyle \frac{2}{(1-x)^3}$ , so that $f''(0) = 2$ .

Hmmm… no obvious pattern yet… so let’s keep going.

For the next term, $f'''(x) = (-3) \cdot 2(1-x)^{-4} \cdot (-1) = \displaystyle \frac{6}{(1-x)^4}$ , so that $f'''(0) = 6$ .

For the next term, $f^{(4)}(x) = (-4) \cdot 6(1-x)^{-5} \cdot (-1) = \displaystyle \frac{24}{(1-x)^5}$ , so that $f^{(4)}(0) = 24$ .

Oohh… it’s the factorials again! It looks like $f^{(n)}(0) = n!$ , and this can be formally proved by induction.

Plugging into the series, we find that

$\displaystyle \frac{1}{1-x} = \sum_{n=0}^\infty \frac{n!}{n!} x^n = \sum_{n=0}^\infty x^n = 1 + x + x^2 + x^3 + \dots$ .

Like the series for $e^x$ , this series converges quickest for $x \approx 0$ . Unlike the series for $e^x$ , this series does not converge for all real numbers. As can be checked with the Ratio Test, this series only converges if $|x| < 1$ .

The right-hand side is a special kind of series typically discussed in precalculus. (Students often pause at this point, because most of them have forgotten this too.) It is an infinite geometric series whose first term is $1$ and common ratio $x$. So starting from the right-hand side, one can obtain the left-hand side using the formula

$a + ar + ar^2 + ar^3 + \dots = \displaystyle \frac{a}{1-r}$

by letting $a=1$ and $r=x$. Also, as stated in precalculus, this series only converges if the common ratio satisfies $|r| < 1$, as before.

In other words, in precalculus, we start with the geometric series and end with the function. With Taylor series, we start with the function and end with the series.

Step 6. A whole bunch of other Taylor series can be quickly obtained from the one for $\displaystyle \frac{1}{1-x}$ . Let’s take the derivative of both sides (and ignore the fact that one should prove that differentiating this infinite series term by term is permissible). Since

$\displaystyle \frac{d}{dx} \left( \frac{1}{1-x} \right) = \frac{1}{(1-x)^2}$

and

$\displaystyle \frac{d}{dx} \left( 1 + x + x^2 + x^3 + x^4 + \dots \right) = 1 + 2x + 3x^2 + 4x^3 + \dots$ ,

we have

$\displaystyle \frac{1}{(1-x)^2} = 1 + 2x + 3x^2 + 4x^3 + \dots$ .

____________________

Next, let’s replace $x$ with $-x$ in the Taylor series in Step 5, obtaining

$\displaystyle \frac{1}{1+x} = 1 - x + x^2 - x^3 + x^4 - x^5 \dots$

Now let’s take the indefinite integral of both sides:

$\displaystyle \int \frac{dx}{1+x} = \int \left( 1 - x + x^2 - x^3 + x^4 - x^5 \dots \right) \, dx$

$\ln(1+x) = \displaystyle x - \frac{x^2}{2} + \frac{x^3}{3} -\frac{ x^4}{4} + \frac{x^5}{5} -\frac{ x^6}{6} \dots + C$

To solve for the constant of integration, let $x = 0$ :

$\ln(1) = 0+ C \Longrightarrow C = 0$

Plugging back in, we conclude that

$\ln(1+x) = x - \displaystyle \frac{x^2}{2} + \frac{x^3}{3} -\frac{ x^4}{4} + \frac{x^5}{5} -\frac{ x^6}{6} \dots$

The Taylor series expansion for $\ln(1-x)$ can be found by replacing $x$ with $-x$ :

$\ln(1-x) = -x - \displaystyle \frac{x^2}{2} - \frac{x^3}{3} -\frac{ x^4}{4} - \frac{x^5}{5} -\frac{ x^6}{6} \dots$

Subtracting, we find

$\ln(1+x) - \ln(1-x) = \ln \displaystyle \left( \frac{1+x}{1-x} \right) = 2x + \frac{2x^3}{3}+ \frac{2x^5}{5} \dots$

My understanding is that this latter series is used by calculators when computing logarithms.

____________________

Next, let’s replace $x$ with $-x^2$ in the Taylor series in Step 5, obtaining

$\displaystyle \frac{1}{1+x^2} = 1 - x^2 + x^4 - x^6 + x^8 - x^{10} \dots$

Now let’s take the indefinite integral of both sides:

$\displaystyle \int \frac{dx}{1+x^2} = \int \left(1 - x^2 + x^4 - x^6 + x^8 - x^{10} \dots\right) \, dx$

$\tan^{-1}x = \displaystyle x - \frac{x^3}{3} + \frac{x^5}{5} -\frac{ x^7}{7} + \frac{x^9}{9} -\frac{ x^{11}}{11} \dots + C$

To solve for the constant of integration, let $x = 0$ :

$\tan^{-1}(1) = 0+ C \Longrightarrow C = 0$

Plugging back in, we conclude that

$\tan^{-1}x = \displaystyle x - \frac{x^3}{3} + \frac{x^5}{5} -\frac{ x^7}{7} + \frac{x^9}{9} -\frac{ x^{11}}{11} \dots$

____________________

In summary, a whole bunch of Taylor series can be extracted quite quickly by differentiating and integrating from a simple infinite geometric series. I’m a firm believer in minimizing the number of formulas that I should memorize. Any time I personally need any of the above series, I’ll quickly use the above steps to derive them from that of $\displaystyle \frac{1}{1-x}$ .

Reminding students about Taylor series (Part 4)

I’m in the middle of a series of posts describing how I remind students about Taylor series. In the previous posts, I described how I lead students to the definition of the Maclaurin series

$f(x) = \displaystyle \sum_{k=0}^{\infty} \frac{f^{(k)}(0)}{k!} x^k$ ,

which converges to $f(x)$ within some radius of convergence for all functions that commonly appear in the secondary mathematics curriculum.

Step 4. Let’s now get some practice with Maclaurin series. Let’s start with $f(x) = e^x$ .

What’s $f(0)$ ? That’s easy: $f(0) = e^0 = 1$ .

Next, to find $f'(0)$ , we first find $f'(x)$ . What is it? Well, that’s also easy: $f'(x) = \frac{d}{dx} (e^x) = e^x$ . So $f'(0)$ is also equal to $1$ .

How about $f''(0)$ ? Yep, it’s also $1$ . In fact, it’s clear that $f^{(n)}(0) = 1$ for all $n$ , though we’ll skip the formal proof by induction.

Plugging into the above formula, we find that

$e^x = \displaystyle \sum_{k=0}^{\infty} \frac{1}{k!} x^k = \sum_{k=0}^{\infty} \frac{x^k}{k!} = 1 + x + \frac{x^2}{2} + \frac{x^3}{3} + \dots$

It turns out that the radius of convergence for this power series is $\infty$ . In other words, the series on the right converges for all values of $x$ . So we’ll skip this for review purposes, this can be formally checked by using the Ratio Test.

At this point, students generally feel confident about the mechanics of finding a Taylor series expansion, and that’s a good thing. However, in my experience, their command of Taylor series is still somewhat artificial. They can go through the motions of taking derivatives and finding the Taylor series, but this complicated symbol in $\displaystyle \sum$ notation still doesn’t have much meaning.

So I shift gears somewhat to discuss the rate of convergence. My hope is to deepen students’ knowledge by getting them to believe that $f(x)$ really can be approximated to high precision with only a few terms. Perhaps not surprisingly, it converges quicker for small values of $x$ than for big values of $x$ .

Pedagogically, I like to use a spreadsheet like Microsoft Excel to demonstrate the rate of convergence. A calculator could be used, but students can see quickly with Excel how quickly (or slowly) the terms get smaller. I usually construct the spreadsheet in class on the fly (the fill down feature is really helpful for doing this quickly), with the end product looking something like this:

In this way, students can immediately see that the Taylor series is accurate to four significant digits by going up to the $x^4$ term and that about ten or eleven terms are needed to get a figure that is as accurate as the precision of the computer will allow. In other words, for all practical purposes, an infinite number of terms are not necessary.

In short, this is how a calculator computes $e^x$ : adding up the first few terms of a Taylor series. Back in high school, when students hit the $e^x$ button on their calculators, they’ve trusted the result but the mechanics of how the calculator gets the result was shrouded in mystery. No longer.

Then I shift gears by trying a larger value of $x$ :

I ask my students the obvious question: What went wrong? They’re usually able to volunteer a few ideas:

The convergence is slower for larger values of $x$ .
The series will converge, but more terms are needed (and I’ll later use the fill down feature to get enough terms so that it does converge as accurate as double precision will allow).
The individual terms get bigger until $k=11$ and then start getting smaller. I’ll ask my students why this happens, and I’ll eventually get an explanation like

$\displaystyle \frac{(11.5)^6}{6!} < \frac{(11.5)^6}{6!} \times \frac{11.5}{7} = \frac{(11.5)^7}{7!}$

but

$\displaystyle \frac{(11.5)^{11}}{11!} < \frac{(11.5)^{11}}{11!} \times \frac{11.5}{12} = \frac{(11.5)^{12}}{12!}$

At this point, I’ll mention that calculators use some tricks to speed up convergence. For example, the calculator can simply store a few values of $e^x$ in memory, like $e^{16}$ , $e^{8}$ , $e^{4}$ , $e^{2}$ , and $e^{1} = e$ . I then ask my class how these could be used to find $e^{11.5}$ . After some thought, they will volunteer that

$e^{11.5} = e^8 \cdot e^2 \cdot e \cdot e^{0.5}$ .

The first three values don’t need to be computed — they’ve already been stored in memory — while the last value can be computed via Taylor series. Also, since $0.5 < 1$ , the series for $e^{0.5}$ will converge pretty quickly. (Some students may volunteer that the above product is logically equivalent to turning $11$ into binary.)

At this point — after doing these explicit numerical examples — I’ll show graphs of $e^x$ and graphs of the Taylor polynomials of $e^x$ , observing that the polynomials get closer and closer to the graph of $e^x$ as more terms are added. (For example, see the graphs on the Wikipedia page for Taylor series, though I prefer to use Mathematica for in-class purposes.) In my opinion, the convergence of the graphs only becomes meaningful to students only after doing some numerical examples, as done above.

At this point, I hope my students are familiar with the definition of Taylor (Maclaurin) series, can apply the definition to $e^x$ , and have some intuition meaning that the nasty Taylor series expression practically means add a bunch of terms together until you’re satisfied with the convergence.

In the next post, we’ll consider another Taylor series which ought to be (but usually isn’t) really familiar to students: an infinite geometric series.

P.S. Here’s the Excel spreadsheet that I used to make the above figures: Taylor.

Reminding students about Taylor series (Part 3)

In the previous post, I described how I lead students to the equations

$f(x) = \displaystyle \sum_{k=0}^n \frac{f^{(k)}(0)}{k!} x^k$ .

and

$f(x) = \displaystyle \sum_{k=0}^n \frac{f^{(k)}(a)}{k!} (x-a)^k$ ,

where $f(x)$ is a polynomial and $a$ can be any number.

Step 3. What happens if the original function $f(x)$ is not a polynomial? For one thing, the right-hand side can no longer be a finite sum. As long as the sum on the right-hand side stops at some degree $n$ , the right-hand side is a polynomial, but the left-hand side is assumed to not be a polynomial.

To resolve this, we can cross our fingers and hope that

$f(x) = \displaystyle \sum_{k=0}^{\infty} \frac{f^{(k)}(0)}{k!} x^k$ ,

$f(x) = \displaystyle \sum_{k=0}^{\infty}\frac{f^{(k)}(a)}{k!} (x-a)^k$ .

In other words, let’s make the right-hand side an infinite series, and hope for the best. This is the definition of the Taylor series expansions of $f$ .

Note: At this point in the review, I can usually see the light go on in my students’ eyes. Usually, they can now recall their work with Taylor series in the past… and they wonder why they weren’t taught this topic inductively (like I’ve tried to do in the above exposition) instead of deductively (like the presentation in most textbooks).

While we’d like to think that the Taylor series expansions always work, there are at least two things that can go wrong.

First, the sum on the left is an infinite series, and there’s no guarantee that the series will converge in the first place. There are plenty of example of series that diverge, like $\displaystyle \sum_{k=0}^\infty \frac{1}{k+1}$ .
Second, even if the series converges, there’s no guarantee that the series will converge to the “right” answer $f(x)$ . The canonical example of this behavior is $f(x) = e^{-1/x^2}$ , which is so “flat” near $x=0$ that every single derivative of $f$ is equal to $0$ at $x =0$ .

For the first complication, there are multiple tests devised in Calculus II, especially the Ratio Test, to determine the values of $x$ for which the series converges. This establishes a radius of convergence for the series.

The second complication is far more difficult to address rigorously. The good news is that, for all commonly occurring functions in the secondary mathematics curriculum, the Taylor series of a function properly converges (when it does converge). So we will happily ignore this complication for the remainder of the presentation.

Indeed, it’s remarkable that the series should converge to $f(x)$ at all. Think about the meaning of the terms on the right-hand side:

$f(a)$ is the $y-$ coordinate at $x=a$ .
$f'(a)$ is the slope of the curve at $x=a$ .
$f''(a)$ is a measure of the concavity of the curve at — you guessed it — $x=a$ .
$f'''(a)$ is an even more subtle description of the curve… once again, at $x=a$ .

In other words, if the Taylor series converges to $f(x)$ , then every twist and turn of the function, even at points far away from $x=a$ , is encoded somehow in the shape of the curve at the one point $x=a$ . So analytic functions (which has a Taylor series which converges to the original functions) are indeed quite remarkable.

Reminding students about Taylor series (Part 2)

In this series of posts, I will describe the sequence of examples that I use to remind students about Taylor series. (One time, just for fun, I presented this topic at the end of a semester of Calculus I, and it seemed to go well even for that audience who had not seen Taylor series previously.)

I should emphasize that I present this sequence inductively and in an inquiry-based format: I ask leading questions of my students so that the answers of my students are driving the lecture. In other words, I don’t ask my students to simply take dictation. It’s a little hard to describe a question-and-answer format in a blog, but I’ll attempt to do this below.

Step 1. Find the unique quartic (fourth-degree) polynomial so that $f(0) = 6$ , $f'(0) = -3$ , $f''(0) = 6$ , $f'''(0) = 2$ , and $f^{(4)}(0) = 10$ .

I’ve placed a thought bubble if you’d like to think about it before scrolling down to see the answer. Here’s a hint to get started: let $f(x) = ax^4 + bx^3 + cx^2 + dx + e$ , and start differentiating. Remember that $a$ , $b$ , $c$ , $d$ , and $e$ are constants.

We begin with the information that $f(0) = 6$ . How else can we find $f(0)$? Since $f(x) = ax^4 + bx^3 + cx^2 + dx + e$ , we see that $f(0) = e$ . Therefore, it must be that $e = 6$ .

How about $f'(0)$ ? We see that $f'(x) = 4ax^3 + 3bx^2 + 2cx + d$ , and so $f'(0) = d$ . Since $f'(0) = -3$ , we have that $d = -3$ .

Next, $f''(x) = 12ax^2 + 6bx + 2c$ , and so $f''(0) = 2c$ . Since $f''(0) = 6$ ,we have that $2c = 6$ , or $c = 3$ .

Next, $f'''(x) = 24ax + 6b$ , and so $f'''(0) = 6b$ . Since $f'''(0) = 2$ ,we have that $6b = 2$ , or $b = \frac{1}{3}$ .

Finally, $f^{(4)}(x) = 24a$ , and so $f^{(4)}(0) = 24a$ . Since $f^{(4)}(0) = 10$ , we have $24a = 10$ , or $a = \frac{5}{12}$ .

What do we get when we put all of this information together? The polynomial must be

$f(x) = \frac{5}{12} x^4 + \frac{1}{3} x^3 + 3 x^2 - 3x + 6$ .

Step 2. How are these coefficients related to the information given in the problem?

Let’s start with the leading coefficient, $a = \frac{5}{12}$ . How did we get this answer? It came from dividing $10$ by $24$ . Where did the $10$ come from? It was the given value of $f^{(4)}(0)$ , and so

$a = \displaystyle \frac{f^{(4)}(0)}{24}$ .

Next, $b = \frac{1}{3}$ , which arose from dividing $2$ by $6$ . The number $2$ was the given value of $f'''(0)$ , and so

$b =\displaystyle \frac{f'''(0)}{6}$ .

Moving to the next coefficient, $c = 3$ , which arose from dividing $f''(0) = 6$ by $2$ . So

$c = \displaystyle\frac{f''(0)}{2}$ .

Finally, it’s clear that

$d = f'(0)$ and $e = f(0)$ .

This last line doesn’t quite fit the pattern of the first three lines. The first three lines all have fractions, but these last two expressions don’t. How can we fix this? In the hopes of finding a pattern, let’s (unnecessarily) write $d$ and $e$ as fractions by dividing by $1$ :

$d = \displaystyle\frac{f'(0)}{1}$ and $e = \displaystyle \frac{f(0)}{1}$ .

Let’s now rewrite the polynomial $f(x)$ in light of this discussion:

$f(x) = \displaystyle \frac{f'^{(4)}(0)}{24} x^4 + \frac{f'''(0)}{6} x^3 + \frac{f'''(0)}{2} x^2 + \frac{f'(0)}{1}x + \frac{f(0)}{1}$ .

What pattern do we see in the numerators? It’s apparent that the number of derivatives matches the power of $x$ . For example, the $x^3$ term has a coefficient involving the third derivative of $f$ . The last two terms fit this pattern as well, since $x = x^1$ and the last term is multiplied by $x^0 = 1$ .

What pattern do we see in the denominators? $1, 1, 2, 6, 24 \dots$ where have we seen those before? Oh yes, the factorials! We know that $4! = 4 \cdot 3 \cdot 2 \cdot 1 = 24$ , $3! = 3 \cdot 2 \cdot 1 = 6$ , $2! = 2 \cdot 1 = 2$ , $1! = 1$ , and $0!$ is defined to be $1$ . So $f(x)$ can be rewritten as

$f(x) = \displaystyle \frac{f'^{(4)}(0)}{4!} x^4 + \frac{f'''(0)}{3!} x^3 + \frac{f'''(0)}{2!} x^2 + \frac{f'(0)}{1!}x + \frac{f(0)}{0!}$ .

How can this be written more compactly? By using $\displaystyle \sum-$ notation:

$f(x) = \displaystyle \sum_{k=0}^4 \frac{f^{(k)}(0)}{k!} x^k$ .

Why does the sum stop at 4? Because the original polynomial had degree 4. In general, if the polynomial had degree $n$ , it’s reasonable to guess that

$f(x) = \displaystyle \sum_{k=0}^n \frac{f^{(k)}(0)}{k!} x^k$ .

This is called the Maclaurian series, or the Taylor series about $x =0$ . While I won’t prove it here, one can find Taylor series expansions about points other than $0$ :

$f(x) = \displaystyle \sum_{k=0}^n \frac{f^{(k)}(a)}{k!} (x-a)^k$ ,

where $a$ can be any number. Though not proven here, these series are exactly true for polynomials.

In the next post, we’ll discuss what happens if $f(x)$ is not a polynomial.

Reminding students about Taylor series (Part 1)

At my university, Calculus II covers approximately the same topics covered in an AP Calculus BC course: integrals and derivatives with logarithms and exponential functions, various techniques of integration (including integration by parts and trigonometric substitutions), and convergence of infinite series.

In my opinion, the single most important of these topics is Taylor series (or, if you prefer, Maclaurin series), as these approximations to transcendental functions like $e^x$ and $\sin x$ are used over and over again in higher mathematics.

$\bullet$ A good working knowledge of Taylor series is necessary for computing series solutions of ordinary differential equations.

$\bullet$ In physics, elementary approximations like $\sin x \approx x$ are used over and over again. For example, the governing differential equation for the motion of oscillating pendulums is

$\displaystyle \frac{d^2 \theta}{dt^2} + \frac{g}{\ell} \sin \theta = 0$ ,

where $g$ is the acceleration due to gravity and $\ell$ is the length of the pendulum. This differential equation cannot be solved exactly, and its solution is very complex.

However, for small angles, we may use the approximation $\sin \theta \approx \theta$ , so that the differential equation becomes

$\displaystyle \frac{d^2 \theta}{dt^2} + \frac{g}{\ell} \theta = 0$ ,

By eliminating the $\sin \theta$ term, we now have a second-order differential equation with constant coefficients, which can be solved in a straightforward manner using standard techniques from differential equations. If $\theta(0) = \theta_0$ and $\theta'(0) = 0$ (i.e., the pendulum is pulled a small angle $\theta_0$ and is then released), the solution is

$\theta(t) = \theta_0 \cos\left(t \sqrt{\displaystyle \frac{g}{\ell}} \right)$ .

In other words, the pendulum exhibits sinusoidal behavior. (FYI, for an amazing display of kinetic art, see this demonstration of pendulum waves.)

$\bullet$ The primary way that students interface with Taylor series is through their calculators. When a calculator computes $\cos 1000^o$ , it doesn’t draw a unit circle, trace out an angle of $1000^o$ in standard position, and find the $x-$ coordinate of the terminal point. Instead, the calculator converts $1000^o$ into radians and adds the first few terms of the Taylor series expansion for $\cos x.$

The calculator may use a few tricks to accelerate convergence. For this example, using some trigonometric identities, $\cos 1000^o= \cos 280^o= \cos 80^o= \sin 10^o$ , and (as I’ll discuss) the Maclaurin series for $\sin x$ at $x = 10^o$ converges much faster than the Maclaurin series for $\cos x$ at $x = 1000^o$ .

I’ve argued the importance of Taylor series in higher-level courses in both mathematics and physics. Sadly, at least at my university, Taylor series is probably the topic that is least retained by students years after taking Calculus II. They can remember the rules for integration and differentiation, but their command of Taylor series seems to slip through the cracks.

In my opinion, the reason for this lack of retention is completely understandable from a student’s perspective: Taylor series is usually the last topic covered in a semester, and so students learn them quickly for the final and quickly forget about them as soon as the final is over.

Of course, when I need to use Taylor series in an advanced course but my students have completely forgotten this prerequisite knowledge, I have to get them up to speed as soon as possible. Over the next few posts, I will present the sequence of examples that I use to accomplish this task. Covering this sequence usually takes me about 30-40 minutes of class time, depending on the class.

I should emphasize that, as much as possible, I present this sequence inductively and in an inquiry-based format: I ask leading questions of my students so that the answers of my students are driving the lecture. In other words, I don’t ask my students to simply take dictation. It’s a little hard to describe a question-and-answer format in a blog, but I’ll attempt to do this below.

Beginning with the next post, I’ll describe this sequence.

Welch’s formula

When conducting an hypothesis test or computing a confidence interval for the difference $\overline{X}_1 - \overline{X}_2$ of two means, where at least one mean does not arise from a small sample, the Student t distribution must be employed. In particular, the number of degrees of freedom for the Student t distribution must be computed. Many textbooks suggest using Welch’s formula:

$df = \frac{\displaystyle (SE_1^2 + SE_2^2)^2}{\displaystyle \frac{SE_1^4}{n_1-1} + \frac{SE_2^4}{n_2-1}},$

rounded down to the nearest integer. In this formula, $SE_1 = \displaystyle \frac{\sigma_1}{\sqrt{n_1}}$ is the standard error associated with the first average $\overline{X}_1$ , where $\sigma_1$ (if known) is the population standard deviation for $X$ and $n_1$ is the number of samples that are averaged to find $\overline{X}_1$ . In practice, $\sigma_1$ is not known, and so the bootstrap estimate $\sigma_1 \approx s_1$ is employed.

The terms $SE_2$ and $n_2$ are similarly defined for the average $\overline{X}_2$ .

In Welch’s formula, the term $SE_1^2 + SE_2^2$ in the numerator is equal to $\displaystyle \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}$ . This is the square of the standard error $SE_D$ associated with the difference $\overline{X}_1 - \overline{X}_2$ , since

$SE_D = \displaystyle \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$ .

This leads to the “Pythagorean” relationship

$SE_1^2 + SE_2^2 = SE_D^2$ ,

which (in my experience) is a reasonable aid to help students remember the formula for $SE_D$ .

Naturally, a big problem that students encounter when using Welch’s formula is that the formula is really, really complicated, and it’s easy to make a mistake when entering information into their calculators. (Indeed, it might be that the pre-programmed calculator function simply gives the wrong answer.) Also, since the formula is complicated, students don’t have a lot of psychological reassurance that, when they come out the other end, their answer is actually correct. So, when teaching this topic, I tell my students the following rule of thumb so that they can at least check if their final answer is plausible:

$\min(n_1,n_2)-1 \le df \le n_1 + n_2 -2$ .

To my surprise, I have never seen this formula in a statistics textbook, even though it’s quite simple to state and not too difficult to prove using techniques from first-semester calculus.

Let’s rewrite Welch’s formula as

$df = \left( \displaystyle \frac{1}{n_1-1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{n_2-1} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}$

For the sake of simplicity, let $m_1 = n_1 - 1$ and $m_2 = n_2 -1$ , so that

$df = \left( \displaystyle \frac{1}{m_1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{m_2} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}$

Now let $x = \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2}$ . All of these terms are nonnegative (and, in practice, they’re all positive), so that $x \ge 0$ . Also, the numerator is no larger than the denominator, so that $x \le 1$ . Finally, we notice that

$1-x = 1 - \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2} = \frac{SE_2^2}{SE_1^2 + SE_2^2}$ .

Using these observations, Welch’s formula reduces to the function

$f(x) = \left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-1}$ ,

and the central problem is to find the maximum and minimum values of $f(x)$ on the interval $0 \le x \le 1$ . Since $f(x)$ is differentiable on $[0,1]$ , the absolute extrema can be found by checking the endpoints and the critical point(s).

First, the endpoints. If $x=0$ , then $f(0) = \left( \displaystyle \frac{1}{m_2} \right)^{-1} = m_2$ . On the other hand, if $x=1$ , then $f(1) = \left( \displaystyle \frac{1}{m_1} \right)^{-1} = m_1$ .

Next, the critical point(s). These are found by solving the equation $f'(x) = 0$ :

$f'(x) = -\left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-2} \left[ \displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} \right] = 0$

$\displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} = 0$

$\displaystyle \frac{2x}{m_1} = \frac{2(1-x)}{m_2}$

$xm_2= (1-x)m_1$

$xm_2 = m_1 - xm_1$

$x(m_1 + m_2) = m_1$

$x = \displaystyle \frac{m_1}{m_1 + m_2}$

Plugging back into the original equation, we find the local extremum

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[1-\frac{m_1}{m_1+m_2}\right]^2 \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[\frac{m_2}{m_1+m_2}\right]^2 \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1}{(m_1+m_2)^2} + \frac{m_2}{(m_1+m_2)^2} \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1+m_2}{(m_1+m_2)^2} \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1+m_2} \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = m_1+m_2$

Based on the three local extrema that we’ve found, it’s clear that the absolute minimum of $f(x)$ on $[0,1]$ is the smaller of $m_1$ and $m_2$ , while the absolute maximum is equal to $m_1 + m_2$ .

$\hbox{QED}$

In conclusion, I suggest offering the following guidelines to students to encourage their intuition about the plausibility of their answers:

If $SE_1$ is much smaller than $SE_2$ (i.e., $x \approx 0$ ), then $df$ will be close to $m_2 = n_2 - 1$ .
If $SE_1$ is much larger than $SE_2$ (i.e., $x \approx 1$ ), then $df$ will be close to $m_1 = n_1 - 1$ .
Otherwise, $df$ could be as large as $m_1 + m_2 = n_1 + n_2 - 2$ , but no larger.

A Valentine’s Day card

No joke, a textbook publisher sent me this image on Valentine’s Day.

Measuring terminal velocity

Using a simultaneously falling softball as a stopwatch, the terminal velocity of a whiffle ball can be obtained to surprisingly high accuracy with only common household equipment. In the January 2013 issue of College Mathematics Monthly, we describe an classroom activity that engages students in this apparently daunting task that nevertheless is tractable, using a simple model and mathematical techniques at their disposal.

Epsilon

Years ago, when I taught calculus, I’d usually include the following extra credit question on the first exam: “In the small box, write a good value for $\varepsilon$ . A valid answer gets 4 points; the smallest answer in the class will get 5 points.” It was basically free extra credit… any positive number would work, but it was a (hopefully) fun way for students to be a little competitive in coming up with small positive numbers, which is the intuitive meaning of $\varepsilon$ in mathematics. (I still remember when my high school math teacher was giving me directions to a restaurant, concluding “You’ll know you’re within $\varepsilon$ of the restaurant when you see the signs for Such-and-Such Mall.”)

Most students volunteered something like $0.0000001$ or $10^{-9999999999999999}$ . Except for one particularly gutsy student who wrote, “The probability that Dr. Q gets a date on Friday night.” For sheer nerve, he got the 5 points that year.

Also getting 5 points that year was the best answer of the class: “Let $x$ be the smallest answer that anyone else wrote. Then $\varepsilon = x/2$ .” That was especially clever from a calculus student, as that’s the essence of a fairly common technique when writing proofs in real analysis.

Bumper sticker

For what it’s worth, I also have this bumper sticker on my office door.