Reminding students about Taylor series (Part 1)

At my university, Calculus II covers approximately the same topics covered in an AP Calculus BC course: integrals and derivatives with logarithms and exponential functions, various techniques of integration (including integration by parts and trigonometric substitutions), and convergence of infinite series.

In my opinion, the single most important of these topics is Taylor series (or, if you prefer, Maclaurin series), as these approximations to transcendental functions like e^x and \sin x are used over and over again in higher mathematics.

\bullet A good working knowledge of Taylor series is necessary for computing series solutions of ordinary differential equations.

\bullet In physics, elementary approximations like \sin x \approx x are used over and over again. For example, the governing differential equation for the motion of oscillating pendulums is

\displaystyle \frac{d^2 \theta}{dt^2} + \frac{g}{\ell} \sin \theta = 0,

where g is the acceleration due to gravity and \ell is the length of the pendulum. This differential equation cannot be solved exactly, and its solution is very complex.

However, for small angles, we may use the approximation \sin \theta \approx \theta, so that the differential equation becomes

\displaystyle \frac{d^2 \theta}{dt^2} + \frac{g}{\ell} \theta = 0,

By eliminating the \sin \theta term, we now have a second-order differential equation with constant coefficients, which can be solved in a straightforward manner using standard techniques from differential equations. If \theta(0) = \theta_0 and \theta'(0) = 0 (i.e., the pendulum is pulled a small angle \theta_0 and is then released), the solution is

\theta(t) = \theta_0 \cos\left(t \sqrt{\displaystyle \frac{g}{\ell}} \right).

In other words, the pendulum exhibits sinusoidal behavior. (FYI, for an amazing display of kinetic art, see this demonstration of pendulum waves.)

\bullet The primary way that students interface with Taylor series is through their calculators. When a calculator computes \cos 1000^o, it doesn’t draw a unit circle, trace out an angle of 1000^o in standard position, and find the x-coordinate of the terminal point. Instead, the calculator converts 1000^o into radians and adds the first few terms of the Taylor series expansion for \cos x.

The calculator may use a few tricks to accelerate convergence. For this example, using some trigonometric identities, \cos 1000^o= \cos 280^o= \cos 80^o= \sin 10^o, and (as I’ll discuss) the Maclaurin series for \sin x at x = 10^o converges much faster than the Maclaurin series for \cos x at x = 1000^o.

green line

I’ve argued the importance of Taylor series in higher-level courses in both mathematics and physics. Sadly, at least at my university, Taylor series is probably the topic that is least retained by students years after taking Calculus II. They can remember the rules for integration and differentiation, but their command of Taylor series seems to slip through the cracks.

In my opinion, the reason for this lack of retention is completely understandable from a student’s perspective: Taylor series is usually the last topic covered in a semester, and so students learn them quickly for the final and quickly forget about them as soon as the final is over.

Of course, when I need to use Taylor series in an advanced course but my students have completely forgotten this prerequisite knowledge, I have to get them up to speed as soon as possible. Over the next few posts, I will present the sequence of examples that I use to accomplish this task. Covering this sequence usually takes me about 30-40 minutes of class time, depending on the class.

I should emphasize that, as much as possible, I present this sequence inductively and in an inquiry-based format: I ask leading questions of my students so that the answers of my students are driving the lecture. In other words, I don’t ask my students to simply take dictation. It’s a little hard to describe a question-and-answer format in a blog, but I’ll attempt to do this below.

Beginning with the next post, I’ll describe this sequence.

Welch’s formula

When conducting an hypothesis test or computing a confidence interval for the difference \overline{X}_1 - \overline{X}_2 of two means, where at least one mean does not arise from a small sample, the Student t distribution must be employed. In particular, the number of degrees of freedom for the Student t distribution must be computed. Many textbooks suggest using Welch’s formula:

df = \frac{\displaystyle (SE_1^2 + SE_2^2)^2}{\displaystyle \frac{SE_1^4}{n_1-1} + \frac{SE_2^4}{n_2-1}},

rounded down to the nearest integer. In this formula, SE_1 = \displaystyle \frac{\sigma_1}{\sqrt{n_1}} is the standard error associated with the first average \overline{X}_1, where \sigma_1 (if known) is the population standard deviation for X and n_1 is the number of samples that are averaged to find \overline{X}_1. In practice, \sigma_1 is not known, and so the bootstrap estimate \sigma_1 \approx s_1 is employed.

The terms SE_2 and n_2 are similarly defined for the average \overline{X}_2.

In Welch’s formula, the term SE_1^2 + SE_2^2 in the numerator is equal to \displaystyle \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}. This is the square of the standard error SE_D associated with the difference \overline{X}_1 - \overline{X}_2, since

SE_D = \displaystyle \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}.

This leads to the “Pythagorean” relationship

SE_1^2 + SE_2^2 = SE_D^2,

which (in my experience) is a reasonable aid to help students remember the formula for SE_D.

green line

Naturally, a big problem that students encounter when using Welch’s formula is that the formula is really, really complicated, and it’s easy to make a mistake when entering information into their calculators. (Indeed, it might be that the pre-programmed calculator function simply gives the wrong answer.) Also, since the formula is complicated, students don’t have a lot of psychological reassurance that, when they come out the other end, their answer is actually correct. So, when teaching this topic, I tell my students the following rule of thumb so that they can at least check if their final answer is plausible:

\min(n_1,n_2)-1 \le df \le n_1 + n_2 -2.

To my surprise, I have never seen this formula in a statistics textbook, even though it’s quite simple to state and not too difficult to prove using techniques from first-semester calculus.

Let’s rewrite Welch’s formula as

df = \left( \displaystyle \frac{1}{n_1-1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{n_2-1} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}

For the sake of simplicity, let m_1 = n_1 - 1 and m_2 = n_2 -1, so that

df = \left( \displaystyle \frac{1}{m_1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{m_2} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}

Now let x = \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2}. All of these terms are nonnegative (and, in practice, they’re all positive), so that x \ge 0. Also, the numerator is no larger than the denominator, so that x \le 1. Finally, we notice that

1-x = 1 - \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2} = \frac{SE_2^2}{SE_1^2 + SE_2^2}.

Using these observations, Welch’s formula reduces to the function

f(x) = \left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-1},

and the central problem is to find the maximum and minimum values of f(x) on the interval 0 \le x \le 1. Since f(x) is differentiable on [0,1], the absolute extrema can be found by checking the endpoints and the critical point(s).

First, the endpoints. If x=0, then f(0) = \left( \displaystyle \frac{1}{m_2} \right)^{-1} = m_2. On the other hand, if x=1, then f(1) = \left( \displaystyle \frac{1}{m_1} \right)^{-1} = m_1.

Next, the critical point(s). These are found by solving the equation f'(x) = 0:

f'(x) = -\left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-2} \left[ \displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} \right] = 0

\displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} = 0

\displaystyle \frac{2x}{m_1} = \frac{2(1-x)}{m_2}

xm_2= (1-x)m_1

xm_2 = m_1 - xm_1

x(m_1 + m_2) = m_1

x = \displaystyle \frac{m_1}{m_1 + m_2}

Plugging back into the original equation, we find the local extremum

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[1-\frac{m_1}{m_1+m_2}\right]^2 \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[\frac{m_2}{m_1+m_2}\right]^2 \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1}{(m_1+m_2)^2} + \frac{m_2}{(m_1+m_2)^2} \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1+m_2}{(m_1+m_2)^2} \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1+m_2} \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = m_1+m_2

Based on the three local extrema that we’ve found, it’s clear that the absolute minimum of f(x) on [0,1] is the smaller of m_1 and m_2, while the absolute maximum is equal to m_1 + m_2.

\hbox{QED}

In conclusion, I suggest offering the following guidelines to students to encourage their intuition about the plausibility of their answers:

  • If SE_1 is much smaller than SE_2 (i.e., x \approx 0), then df will be close to m_2 = n_2 - 1.
  • If SE_1 is much larger than SE_2 (i.e., x \approx 1), then df will be close to m_1 = n_1 - 1.
  • Otherwise, df could be as large as m_1 + m_2 = n_1 + n_2 - 2, but no larger.

Measuring terminal velocity

Using a simultaneously falling softball as a stopwatch, the terminal velocity of a whiffle ball can be obtained to surprisingly high accuracy with only common household equipment. In the January 2013 issue of College Mathematics Monthly, we describe an classroom activity that engages students in this apparently daunting task that nevertheless is tractable, using a simple model and mathematical techniques at their disposal.

Epsilon

Years ago, when I taught calculus, I’d usually include the following extra credit question on the first exam: “In the small box, write a good value for \varepsilon. A valid answer gets 4 points; the smallest answer in the class will get 5 points.” It was basically free extra credit… any positive number would work, but it was a (hopefully) fun way for students to be a little competitive in coming up with small positive numbers, which is the intuitive meaning of \varepsilon in mathematics. (I still remember when my high school math teacher was giving me directions to a restaurant, concluding “You’ll know you’re within \varepsilon of the restaurant when you see the signs for Such-and-Such Mall.”)

Most students volunteered something like 0.0000001 or 10^{-9999999999999999}. Except for one particularly gutsy student who wrote, “The probability that Dr. Q gets a date on Friday night.” For sheer nerve, he got the 5 points that year.

Also getting 5 points that year was the best answer of the class: “Let x be the smallest answer that anyone else wrote. Then \varepsilon = x/2.” That was especially clever from a calculus student, as that’s the essence of a fairly common technique when writing proofs in real analysis.