Different definitions of logarithm (Part 6)

There are two apparently different definitions of a logarithm that appear in the secondary mathematics curriculum:

  1. From Algebra II and Precalculus: If a > 0 and a \ne 1, then f(x) = \log_a x is the inverse function of g(x) = a^x.
  2. From Calculus: for x > 0, we define \ln x = \displaystyle \int_1^x \frac{1}{t} dt.

The connection between these two apparently different ideas begins with the following theorem.

Theorem. Let a \in \mathbb{R}^+ \setminus \{1\}. Suppose that f: \mathbb{R}^+ \rightarrow \mathbb{R} has the following four properties:

  1. f(1) = 0
  2. f(a) = 1
  3. f(xy) = f(x) + f(y) for all x, y \in \mathbb{R}^+
  4. f is continuous

Then f(x) = \log_a x for all x \in \mathbb{R}^+.

Note. To prove this theorem, I will show that f(a^x) = x, thus proving that f is the inverse of g(x) = a^x.

The proof of these theorem divides into four cases:

  1. Positive integers: x = m \in \mathbb{Z}^+
  2. Positive rational numbers: x = \frac{m}{n}, where m,n \in \mathbb{Z}^+
  3. Negative rational numbers: x \in \mathbb{Q}^-
  4. Real (possibly irrational) numbers: x \in \mathbb{R}

In today’s post, I’ll complete the proof by handling Case 4.

green line

Before starting Case 4, I like to take inventory of where we stand in the proof at this point. We have now proven the theorem for all positive rational numbers and for all negative rational numbers. There’s only one rational number left: x = 0. And this single case is simply handled through Property 1:

f(a^0) = f(1) = 0

I also like to keep track of which hypotheses have been used so far in the proof. A quick review of Cases 1-3 will reveal that Properties 1-3 have all been used at least once, but Property 4 (the assumption that f is continuous) has not be used so far. Therefore, we had better expect to use it before completing the proof.

I won’t tell the class this (for fear of discouraging them), but the proof of Case 4 is a bit more abstract than Cases 1-3. I can give a numerical example that (hopefully) will shed some insight into the actual proof. However, for Case 4, the actual proof will not be a perfect parallel of the numerical example (as in Cases 1-3).

Idea behind Case 4. Let’s pick a familiar irrational number like \sqrt{2}. There is a natural way to approximate \sqrt{2} by a sequence of rational numbers… namely, the sequence of numbers obtained by taking one extra digit in the decimal expansion of \sqrt{2}. In other words,

r_1 = 1

r_2 = 1.4

r_3 = 1.41

r_4 = 1.414

and so on.

In this way, \displaystyle \lim_{n \to \infty} r_n = \sqrt{2}.

We would hope that the sequence

f \left( a^1 \right), f \left( a^{1.4} \right), f \left( a^{1.41} \right), f \left( a^{1.414} \right), \dots

converges to the obvious limit of

f \left( a^{\sqrt{2}} \right).

However, this sequence is also equal to

1, 1.4, 1.41, 1.414, \dots

since each exponent is rational. Since a sequence has only one limit, we conclude that these two limits should be equal:

\sqrt{2} = f \left( a^{\sqrt{2}} \right)

So that’s the idea of the formal proof, which we now tackle. In the proof below, I’ve marked with quotations some of the more parenthetical steps so that the main argument of the proof stands out a little bit better. You’ll notice that, unlike Cases 1-3, I don’t use as much directed questioning to get students to volunteer the next step of the proof with minimal assistance from me. That’s because I haven’t figured out a good way to use inquiry to quickly get through Case 4.

Proof of Case 4. Let \{ r_n \} be a sequence of rational numbers that converges to x. (Parenthetically, I’ll mention that the sequence of decimal approximations would be one such sequence, just to make this mysterious \{ r_n \} thing that just appeared out of the blue a little less daunting. Of course, any sequence of rational numbers that converges to x will do. Therefore,

f \left( a^x \right) = f \left( a^{\lim_{n \to \infty} r_n} \right)

The function g(x) = a^x is continuous. From the ordinary definition of continuous used in calculus, this means that

\displaystyle \lim_{x \to c} g(x) = g(c).

In other words, the function and the limit can be interchanged. (I’ll usually throw in my standard joke about functions commuting at this point in the lecture.) Stated in terms of a sequence r_n \to x, this means that

\displaystyle \lim_{n \to \infty} g(r_n) = g(x) = g \left( \lim_{n \to \infty} r_n \right).

Stated another way,

\displaystyle \lim_{n \to \infty} a^{r_n} = a^{ \lim_{n \to \infty} r_n}.

In light of the above work, we conclude that

f \left( a^x \right) = f \left( a^{\lim_{n \to \infty} r_n} \right) = f \left( \displaystyle \lim_{n \to \infty} a^{r_n} \right)

Stated simply, the function and the limit interchange.

We now perform a similar step for the function f. Because f is assumed to be continuous, we know that

\displaystyle \lim_{n \to \infty} f(s_n) = f(c) = f \left( \lim_{n \to \infty} s_n \right)

if \{ s_n \} is a sequence that converges to c. So, if we replace s_n by a^{r_n} and c by \displaystyle \lim_{n \to \infty} a^{r_n}, we conclude that

\displaystyle \lim_{n \to \infty} f \left( a^{r_n} \right) = f \left( \displaystyle \lim_{n \to \infty} a^{r_n} \right)

From the above insight, we see that we have the next step of the proof:

f \left( a^x \right) = f \left( a^{\lim_{n \to \infty} r_n} \right)

= f \left( \displaystyle \lim_{n \to \infty} a^{r_n} \right)

= \displaystyle \lim_{n \to \infty} f \left(a^{r_n} \right)

From now on, the concluding steps are pretty straightforward. The exponent on the last line is a rational number. Therefore, by Cases 2 and 3, we have produce the next step:

f \left( a^x \right) = f \left( a^{\lim_{n \to \infty} r_n} \right)

= f \left( \displaystyle \lim_{n \to \infty} a^{r_n} \right)

= \displaystyle \lim_{n \to \infty} f \left(a^{r_n} \right)

= \displaystyle \lim_{n \to \infty} r_n

Finally, by definition from the top of the proof, we can evaluate this limit:

f \left( a^x \right) = f \left( a^{\lim_{n \to \infty} r_n} \right)

= f \left( \displaystyle \lim_{n \to \infty} a^{r_n} \right)

= \displaystyle \lim_{n \to \infty} f \left(a^{r_n} \right)

= \displaystyle \lim_{n \to \infty} r_n

= x

This concludes the proof that f \left( a^x \right) = x, even if x is an arbitrary (possibly irrational) real number.

Different definitions of logarithm (Part 5)

There are two apparently different definitions of a logarithm that appear in the secondary mathematics curriculum:

  1. From Algebra II and Precalculus: If a > 0 and a \ne 1, then f(x) = \log_a x is the inverse function of g(x) = a^x.
  2. From Calculus: for x > 0, we define \ln x = \displaystyle \int_1^x \frac{1}{t} dt.

The connection between these two apparently different ideas begins with the following theorem.

Theorem. Let a \in \mathbb{R}^+ \setminus \{1\}. Suppose that f: \mathbb{R}^+ \rightarrow \mathbb{R} has the following four properties:

  1. f(1) = 0
  2. f(a) = 1
  3. f(xy) = f(x) + f(y) for all x, y \in \mathbb{R}^+
  4. f is continuous

Then f(x) = \log_a x for all x \in \mathbb{R}^+.

Note. To prove this theorem, I will show that f(a^x) = x, thus proving that f is the inverse of g(x) = a^x.

The proof of these theorem divides into four cases:

  1. Positive integers: x = m \in \mathbb{Z}^+
  2. Positive rational numbers: x = \frac{m}{n}, where m,n \in \mathbb{Z}^+
  3. Negative rational numbers: x \in \mathbb{Q}^-
  4. Real (possibly irrational) numbers: x \in \mathbb{R}

In today’s post, I’ll describe how I prompt my students to prove Case 3 during class time. Cases 4 will appear in tomorrow’s post.

green line

Idea behind Case 3. Though not formally necessary for the proof, I’ve found it helpful to illustrate the idea of the proof with a specific example before proceeding to the general case. So — on the far end of the chalkboard, away from the space that I’ve allocated for the formal write-up of the proof — I’ll write

f \left( a^{-2/3} \cdot a^{2/3} \right) =

I’ll then ask, “How else can we simplify the left-hand side?” As we’ll see below, there are actually two legitimate ways of proceeding. Someone will usually suggest just simplifying the product, and so I’ll write this as the next step:

f \left( a^{-2/3} \cdot a^{2/3} \right) = f \left( a^0 \right)

I’ll then ask a very open-ended question, “Now what?” Usually, someone will suggest simplifying the right-hand side using Property 1:

f \left( a^{-2/3} \cdot a^{2/3} \right) = f \left( a^0 \right) = 0

By this point, after completing Cases 1 and 2, someone will usually suggest expanding the left-hand side:

f \left( a^{-2/3} \right) + f \left( a^{2/3} \right) = 0

I’ll then ask, “What can we do now?” Hopefully, someone will observe that the second term can be simplified using Case 2:

f \left( a^{-2/3} \right) + \displaystyle \frac{2}{3} = 0

f \left( a^{-2/3} \right) = - \displaystyle \frac{2}{3}

I’ll then note that we’ve finished what we set out to do: show that f(a^x) = x when x = - \frac{2}{3}, a negative rational number.

The natural next question is, “Can we do this for any negative rational number and not just -\frac{2}{3}?” This leads to the proof of Case 3. I’ve found that it’s helpful to walk through this proof line by line in step with the case of x= -\frac{2}{3}, so that students can see how the steps of this more abstract proof correspond to the concrete example of x = =\frac{2}{3}.

Proof of Case 3. Let m, n \in \mathbb{Z}^+. Then

f \left(a^{-m/n} \cdot a^{m/n} \right) = f \left(a^0 \right)

f \left( a^{-m/n} \right) + f \left( a^{m/n} \right) = 0

f \left( a^{-m/n} \right) + \displaystyle \frac{m}{n} = 0

f \left( a^{-m/n} \right) = - \displaystyle \frac{m}{n}

Again, I’ve found that the special case x = - \frac{2}{3} is pedagogically helpful, if not logically necessary to prove Case 3.

Different definitions of logarithm (Part 4)

There are two apparently different definitions of a logarithm that appear in the secondary mathematics curriculum:

  1. From Algebra II and Precalculus: If a > 0 and a \ne 1, then f(x) = \log_a x is the inverse function of g(x) = a^x.
  2. From Calculus: for x > 0, we define \ln x = \displaystyle \int_1^x \frac{1}{t} dt.

The connection between these two apparently different ideas begins with the following theorem.

Theorem. Let a \in \mathbb{R}^+ \setminus \{1\}. Suppose that f: \mathbb{R}^+ \rightarrow \mathbb{R} has the following four properties:

  1. f(1) = 0
  2. f(a) = 1
  3. f(xy) = f(x) + f(y) for all x, y \in \mathbb{R}^+
  4. f is continuous

Then f(x) = \log_a x for all x \in \mathbb{R}^+.

Note. To prove this theorem, I will show that f(a^x) = x, thus proving that f is the inverse of g(x) = a^x.

The proof of these theorem divides into four cases:

  1. Positive integers: x = m \in \mathbb{Z}^+
  2. Positive rational numbers: x = \frac{m}{n}, where m,n \in \mathbb{Z}^+
  3. Negative rational numbers: x \in \mathbb{Q}^-
  4. Real (possibly irrational) numbers: x \in \mathbb{R}

In today’s post, I’ll describe how I prompt my students to prove Case 2 during class time. Cases 3-4 will appear in the coming posts.

green line

Idea behind Case 2. Though not formally necessary for the proof, I’ve found it helpful to illustrate the idea of the proof with a specific example before proceeding to the general case. So — on the far end of the chalkboard, away from the space that I’ve allocated for the formal write-up of the proof — I’ll write

2 = f(a^2)

I’ll ask, “How do we know this is true?” The immediate answer: We just did Case 1. I’ll then do something a little unusual and rewrite this equation in a more complicated way:

2 = f(a^2) = f \left( \left[a^{2/3} \right]^3 \right)

After double-checking that the class agrees with this step (even if I just made the right-hand more complicated instead of the usual step of simplifying the right-hand side), I’ll then ask, “OK, we have something to the third power. What can we now do to the right-hand side?” Almost immediately, someone will volunteer the correct next steps using Property 3:

2 = f(a^2) = f \left( a^{2/3} \cdot a^{2/3} \cdot a^{2/3} \right) = f \left( a^{2/3} \right) + f \left( a^{2/3} \right) + f \left( a^{2/3} \right)

I’ll then ask, “How can we simplify the right-hand side?” After a moment of thought, someone will volunteer the correct next step:

2 = f(a^2) = f \left( a^{2/3} \cdot a^{2/3} \cdot a^{2/3} \right) = f \left( a^{2/3} \right) + f \left( a^{2/3} \right) + f \left( a^{2/3} \right)

2 = 3 f \left( a^{2/3} \right)

 I’ll then ask, “How do we isolate the f \left( a^{2/3} \right) term?” The obvious correct answer:

\displaystyle \frac{2}{3} = f(a^{2/3})

I’ll then note that we’ve finished what we set out to do: show that f(a^x) = x when x = \frac{2}{3}.

The natural next question is, “Can we do this for any positive rational number and not just \frac{2}{3}?” This leads to the proof of Case 2. I’ve found that it’s helpful to walk through this proof line by line in step with the case of x=\frac{2}{3}, so that students can see how the steps of this more abstract proof correspond to the concrete example of x =\frac{2}{3}.

Proof of Case 2. Let x = \displaystyle \frac{m}{n} where m, n \in \mathbb{R}^+. Then

m = f(a^m)

m = f \left( \left[ a^{m/n} \right]^n \right)

m = f \left( a^{m/n} \cdot a^{m/n} \cdot \dots \cdot a^{m/n} \right)

m = f \left( a^{m/n} \right) + f \left( a^{m/n} \right) + \dots + f \left( a^{m/n} \right)

m = n f \left( a^{m/n} \right)

\displaystyle \frac{m}{n} = f \left( a^{m/n} \right)

Of course, the special case x = \frac{2}{3} is not logically necessary to prove Case 2. Though not logically necessary, I’ve found it to be pedagogically convenient. From the school of hard knocks, I’ve found that the proof of Case 2 goes over easier with students when they see the idea of the proof presented concretely and then abstractly.

Different definitions of logarithm (Part 3)

There are two apparently different definitions of a logarithm that appear in the secondary mathematics curriculum:

  1. From Algebra II and Precalculus: If a > 0 and a \ne 1, then f(x) = \log_a x is the inverse function of g(x) = a^x.
  2. From Calculus: for x > 0, we define \ln x = \displaystyle \int_1^x \frac{1}{t} dt.

The connection between these two apparently different ideas begins with the following theorem.

Theorem. Let a \in \mathbb{R}^+ \setminus \{1\}. Suppose that f: \mathbb{R}^+ \rightarrow \mathbb{R} has the following four properties:

  1. f(1) = 0
  2. f(a) = 1
  3. f(xy) = f(x) + f(y) for all x, y \in \mathbb{R}^+
  4. f is continuous

Then f(x) = \log_a x for all x \in \mathbb{R}^+.

Note. To prove this theorem, I will show that f(a^x) = x, thus proving that f is the inverse of g(x) = a^x.

The proof of these theorem divides into four cases:

  1. Positive integers: x = m \in \mathbb{Z}^+
  2. Positive rational numbers: x = \frac{m}{n}, where m,n \in \mathbb{Z}^+
  3. Negative rational numbers: x \in \mathbb{Q}^-
  4. Real (possibly irrational) numbers: x \in \mathbb{R}

In today’s post, I’ll describe how I prompt my students to prove Case 1 during class time. Cases 2-4 will appear in the coming posts.

green line

Idea behind Case 1. Though not formally necessary for the proof, I’ve found it helpful to illustrate the idea of the proof with a specific example before proceeding to the general case. So — on the far end of the chalkboard, away from the space that I’ve allocated for the formal write-up of the proof — I’ll write

f(a^4) =

I’ll then ask, “How else can we write a^4?” Someone will usually suggest a \cdot a \cdot a \cdot a, and so I’ll write this as the next step:

f(a^4) = f(a \cdot a \cdot a \cdot a)

I’ll then ask, “OK, we have a product here. How can we simplify the right-hand side?” After a moment of thought, someone will volunteer that Property 3 allows the right-hand side to be split up into pieces:

f(a^4) = f(a \cdot a \cdot a \cdot a) = f(a) + f(a) + f(a) + f(a)

(Technically, this requires mathematical induction to generalize Property 3 from a product of two numbers to a product of arbitrarily many numbers, but I don’t think that it’s worth the time to expound on this pedantic point.) I’ll then ask, “How can we simplify this?” Almost immediately, someone will usually volunteer Property 2:

f(a^4) = f(a \cdot a \cdot a \cdot a) = f(a) + f(a) + f(a) + f(a) = 1 + 1 + 1 + 1 = 4

I’ll then note that we’ve finished what we set out to do: show that f(a^x) = x when x = 4.

The natural next question is, “Can we do this for any positive integer and not just 4?” This leads to the proof of Case 1. I’ve found that it’s helpful to walk through this proof line by line in step with the case of x=4, so that students can see how the steps of this more abstract proof correspond to the concrete example of x =4.

Proof of Case 1.

f(a^m) = f(a \cdot a \cdot \dots \cdot a)

= f(a) + f(a) + \dots + f(a)

= 1 + 1 + \dots + 1

= m

Of course, the special case x = 4 is not logically necessary to prove Case 1. That said, from the school of hard knocks, I’ve found that the proof of Case 1 goes over easier with students when they see the idea of the proof presented concretely and then abstractly.

Different definitions of logarithm (Part 2)

There are two apparently different definitions of a logarithm that appear in the secondary mathematics curriculum:

  1. From Algebra II and Precalculus: If b > 0 and b \ne 1, then f(x) = \log_b x is the inverse function of g(x) = b^x.
  2. From Calculus: for x > 0, we define \ln x = \displaystyle \int_1^x \frac{1}{t} dt.

In this series of posts, we examine the interrelationship between these two different approaches to logarithms. This is a standard topic in my class for future teachers of secondary mathematics as a way of deepening their understanding of a topic that they think they know quite well.

green lineThe connection between these two apparently different ideas begins with the following theorem.

Theorem. Let a \in \mathbb{R}^+ \setminus \{1\}. Suppose that f: \mathbb{R}^+ \rightarrow \mathbb{R} has the following four properties:

  1. f(1) = 0
  2. f(a) = 1
  3. f(xy) = f(x) + f(y) for all x, y \in \mathbb{R}^+
  4. f is continuous

Then f(x) = \underline{\hspace{1in}} for all x \in \mathbb{R}^+.

When writing this on the board, I purposefully leave an underline for my students to fill in, because I want them to think. What familiar function has these four properties? I’ll usually invoke the old chidren’s joke: “If it looks like an elephant, smells like an elephant, feels like an elephant, and sounds like an elephant, then it must be an elephant.” After a moment of thought, someone will usually volunteer f(x) = \log x. That’s almost correct, and so I’ll ask if Property 2 is satisfied by this function. After a couple more moments of thought, someone will volunteer the correct answer, f(x) = \log_a x.

To prove this theorem, I will show that

f(a^x) = x for all x \in \mathbb{R}.

I’ll make the observation that the case of $latex  x=0$ is Property 1, while the case of x = 1 is Property 2.

Then I’ll ask the class: “If I’m able to prove that f(a^x) = x for all real x, why does this mean that f(x) = \log_a x?” Perhaps unsurprisingly, this usually draws blank stares for a few seconds until someone realizes that this means that f: \mathbb{R}^+ \rightarrow \mathbb{R} and g: \mathbb{R} \rightarrow \mathbb{R}^+ defined by g(x) = a^x are inverse functions. So (by definition) f(x) must be equal to \log_a x.

green lineThe proof of these theorem has four parts:

  1. Positive integers: x = m \in \mathbb{Z}^+
  2. Positive rational numbers: x = \frac{m}{n}, where m,n \in \mathbb{Z}^+
  3. Negative rational numbers: x \in \mathbb{Q}^-
  4. Real (possibly irrational) numbers: x \in \mathbb{R}

Beginning with tomorrow’s post, I’ll discuss how I walk students through the proof in lecture.

 

Different definitions of logarithm (Part 1)

There are two apparently different definitions of a logarithm that appear in the secondary mathematics curriculum:

  1. From Algebra II and Precalculus: If b > 0 and b \ne 1, then f(x) = \log_b x is the inverse function of g(x) = b^x.
  2. From Calculus: for x > 0, we define \ln x = \displaystyle \int_1^x \frac{1}{t} dt.

On the surface, these two ways of viewing logarithms are completely separate from each other, and so even advanced math majors are surprised that these two ways of viewing logarithms are logically interrelated. In the words of Tom Apostol (Calculus, Vol. 1, 2nd edition, 1967, page 227):

The logarithm is an example of a mathematical concept that can be defined in many different ways. When a mathematician tries to formulate a definition of a concept, such as the logarithm, he usually has in mind a number of properties he wants this concept to have. By examining these properties, he is often led to a simple formula or process that might serve as a definition from which all the desired properties spring forth as logical deductions.

In this series of posts, we examine the interrelationship between these two different approaches to logarithms. This is a standard topic in my class for future teachers of secondary mathematics as a way of deepening their understanding of a topic that they think they know quite well.

 

The one problem I missed, 30 years ago, on my final exam in calculus

It’s been said that we often remember our failures more than our successes. In this instance, the adage rings true, because I can still remember, clear as a bell, the one problem that I got wrong on my high school calculus final that I took 30 years ago. Here it is:

\displaystyle \int (x^2+1)^2 dx

I tried every u-substitution under the sun, with no luck. I tried u = x^2+1. However, du would be equal to 2x \, dx, and there was no extra x in the integrand.

I believe I tried every crazy, unorthodox u-substitution possible given the time constraints of the exam: u = \sqrt{x}, u = \sqrt{x^2+1}, u = 1/x. Nothing worked.

We had learned trigonometric substitutions in my class, and so I also tried those. I started with x = \tan u, so that x^2 + 1 = \tan^2 x + 1 = \sec^2 x. This looked promising. However, dx = \sec^2 u \, du, so the integral became \displaystyle \int \sec^4 u \, du. From there, I was stuck. (Now that I’m older, I know that the logical train actually goes in the reverse direction than what I attempted as a student.)

I wasn’t taught integration by parts in this first course in calculus, so I didn’t even know to try it. Had I known this technique, I probably would’ve broken through my conceptual barrier to finally get the right answer. (In other words, integration by parts will yield the correct answer, but it’s a lot of work!) But I didn’t know about it then, and so I get to tell the story now.

Exasperated, I turned in my exam when time was called, and I asked my teacher how this integral was supposed to be solved.

Easy, she told me: just square out the inside:

\displaystyle \int (x^2+1)^2 dx = \displaystyle \int (x^4 + 2x^2 + 1) \, dx = \displaystyle \frac{x^5}{5} + \frac{2x^3}{3} + x + C

At the time, I was unbelievably annoyed at myself. Now, I love telling this anecdote to my students as I relate to their own frustrations as they practice the art of integration.

Functions that commute

At the bottom of this post is a one-liner that I use in my classes the first time I present a theorem where two functions are permitted to commute. At many layers of the mathematics curriculum, students learn about that various functions can essentially commute with each other. In other words, the order in which the operations is performed doesn’t affect the final answer. Here’s a partial list off the top of my head:

  1. Arithmetic/Algebra: a \cdot (b + c) = a \cdot b + a \cdot c. This of course is commonly called the distributive property (and not the commutative property), but the essential idea is that the same answer is obtained whether the multiplications are performed first or if the addition is performed first.
  2. Algebra: If a,b > 0, then \sqrt{ab} = \sqrt{a} \sqrt{b}.
  3. Algebra: If a,b > 0 and x is any real number, then (ab)^x = a^x b^x.
  4. Precalculus: \displaystyle \sum_{i=1}^n (a_i+b_i) = \displaystyle \sum_{i=1}^n a_i + \sum_{i=1}^n b_i.
  5. Precalculus: \displaystyle \sum_{i=1}^n c a_i = c \displaystyle \sum_{i=1}^n a_i.
  6. Calculus: If f is continuous at an interior point c, then \displaystyle \lim_{x \to c} f(x) = f(c).
  7. Calculus: If f and g are differentiable, then (f+g)' = f' + g'.
  8. Calculus: If f is differentiable and c is a constant, then (cf)' = cf'.
  9. Calculus: If f and g are integrable, then \int (f+g) = \int f + \int g.
  10. Calculus: If f is integrable and c is a constant, then \int cf = c \int f.
  11. Calculus: If f: \mathbb{R}^2 \to \mathbb{R} is integrable, \iint f(x,y) dx dy = \iint f(x,y) dy dx.
  12. Calculus: For most differentiable function f: \mathbb{R}^2 \to \mathbb{R} that arise in practice, \displaystyle \frac{\partial^2 f}{\partial x \partial y} = \displaystyle \frac{\partial^2 f}{\partial y \partial x}.
  13. Probability: If X and Y are random variables, then E(X+Y) = E(X) + E(Y).
  14. Probability: If X is a random variable and c is a constant, then E(cX) = c E(X).
  15. Probability: If X and Y are independent random variables, then E(XY) = E(X) E(Y).
  16. Probability: If X and Y are independent random variables, then \hbox{Var}(X+Y) = \hbox{Var}(X) + \hbox{Var}(Y).
  17. Set theory: If A, B, and C are sets, then A \cup (B \cap C) = (A \cup B) \cap (A \cup C).
  18. Set theory: If A, B, and C are sets, then A \cap (B \cup C) = (A \cap B) \cup (A \cap C).

However, there are plenty of instances when two functions do not commute. Most of these, of course, are common mistakes that students make when they first encounter these concepts. Here’s a partial list off the top of my head. (For all of these, the inequality sign means that the two sides do not have to be equal… though there may be special cases when equality happens to happen.)

  1. Algebra: (a+b)^x \ne a^x + b^x if x \ne 1. Important special cases are x = 2, x = 1/2, and x = -1.
  2. Algebra/Precalculus: \log_b(x+y) = \log_b x + \log_b y. I call this the third classic blunder.
  3. Precalculus: (f \circ g)(x) \ne (g \circ f)(x).
  4. Precalculus: \sin(x+y) \ne \sin x + \sin y, \cos(x+y) \ne \cos x + \cos y, etc.
  5. Precalculus: \displaystyle \sum_{i=1}^n (a_i b_i) \ne \displaystyle \left(\sum_{i=1}^n a_i \right) \left( \sum_{i=1}^n b_i \right).
  6. Calculus: (fg)' \ne f' \cdot g'.
  7. Calculus \left( \displaystyle \frac{f}{g} \right)' \ne \displaystyle \frac{f'}{g'}
  8. Calculus: \int fg \ne \left( \int f \right) \left( \int g \right).
  9. Probability: If X and Y are dependent random variables, then E(XY) \ne E(X) E(Y).
  10. Probability: If X and Y are dependent random variables, then \hbox{Var}(X+Y) \ne \hbox{Var}(X) + \hbox{Var}(Y).

All this to say, it’s a big deal when two functions commute, because this doesn’t happen all the time.

green lineI wish I could remember the speaker’s name, but I heard the following one-liner at a state mathematics conference many years ago, and I’ve used it to great effect in my classes ever since. Whenever I present a property where two functions commute, I’ll say, “In other words, the order of operations does not matter. This is a big deal, because, in real life, the order of operations usually is important. For example, this morning, you probably got dressed and then went outside. The order was important.”

 

How to check if a student really can perform the Chain Rule

In my experience, a problem like the following is the acid test for determining if a student really understands the Chain Rule:

Find f'(x) if f(x) = \left[6x^2 + \sin 5x \right]^3

The correct answer (unsimplified):

f'(x) = 3 \left[6x^2 + \sin 5x \right]^2 \left(12x + [\cos 5x] \cdot 5 \right)

However, even students that are quite proficient with the Chain Rule can often provide the following incorrect answer:

f'(x) = 3 \left[6x^2 + \sin 5x \right]^2 \left(12x + \cos 5x \right) \cdot 5

Notice the slightly incorrect placement of the 5 at the end of the derivative. Students can so easily get into the rhythm of just multiplying by the derivative of the inside that they can forget where the derivative of the inside should be placed.

Needless to say, a problem like this often appears on my exams as a way of separating the A students from the B students.

Teaching the Chain Rule inductively

I taught Calculus I every spring between 1996 and 2008. Perhaps the hardest topic to teach — at least for me — in the entire course was the Chain Rule. In the early years, I would show students the technique, but it seemed like my students accepted it on faith that their professor knew what he was talking about it. Also, it took them quite a while to become proficient with the Chain Rule… as opposed to the Product and Quotient Rules, which they typically mastered quite quickly (except for algebraic simplifications).

It took me several years before I found a way of teaching the Chain Rule so that the method really sunk into my students by the end of the class period. Here’s the way that I now teach the Chain Rule.

On the day that I introduce the Chain Rule, I teach inductively (as opposed to deductively). At this point, my students are familiar with how to differentiate y = x^n for positive and negative integers n, the trigonometric function, and y = \sqrt{x}. They also know the Product and Quotient Rules.

I begin class by listing a whole bunch of functions that can be found by the Chain Rule if they knew the Chain Rule. However, since my students don’t know the Chain Rule yet, they have to find the derivatives some other way. For example:

Let y = (3x - 5)^2. Then

y = (3x - 5) \cdot (3x -5)

y' = 3 \cdot (3x -5) + (3x -5) \cdot 3

y' = 6(3x-5).

Let y = (x^3 + 4)^2. Then

y = (x^3 + 4) \cdot (x^3 + 4)

y' = 3x^2 \cdot (x^3 + 4) + (x^3 + 4) \cdot 3x^2

y' = 6x^2 (x^3 + 4)

Let y = (\sqrt{x} + 5)^2. Then

y = x + 10 \sqrt{x} + 25

y' = 1 + \displaystyle \frac{5}{\sqrt{x}}

Let y = \sin^2 x. Then

y = \sin x \cdot \sin x

y' = \cos x \cdot \sin x + \sin x \cdot \cos x

y' = 2 \sin x \cos x

Let $y = \sin 2x$. Then

y = 2 \sin x \cos x

y' = 2 \cos x \cos x - 2 \sin x \sin x

y' = 2 (\cos^2 x - \sin^2 x)

y' = 2 \cos 2x

The important thing is to list example after example after example, and have students compute the derivatives. All along, I keep muttering something like, “Boy, it would sure be nice if there was a short-cut that would save us from doing all this work.” Of course, there is a short-cut (the Chain Rule), but I don’t tell the students what it is. Instead, I make the students try to figure out the pattern for themselves. This is absolutely critical: I don’t spill the beans. I just wait and wait and wait until the students figure out the pattern for themselves… though I might give suggestive hints, like rewriting the 6 in the first example as $\latex 3 \times 2$.

This can take 20-30 minutes, and perhaps over a dozen examples (like those above), as students are completely engaged and frustrated trying to figure out the short-cut. But my experience is that when it clicks, it really clicks. So this pedagogical technique requires a lot of patience on the part of the instructor to not “save time” by giving the answer but to allow the students the thrill of discovering the pattern for themselves.

Once the Chain Rule is discovered, then my experience is that students have been prepared for differentiating more complicated functions, like y = \sqrt{4 + \sin 2x} and y = \cos ( \sqrt{x} ). In other words, there’s a significant front-end investment of time as students discover the Chain Rule, but applying the Chain Rule generally moves along quite quickly once it’s been discovered.