Thoughts on Numerical Integration (Part 19): Trapezoid rule and global rate of convergence

Numerical integration is a standard topic in first-semester calculus. From time to time, I have received questions from students on various aspects of this topic, including:
  • Why is numerical integration necessary in the first place?
  • Where do these formulas come from (especially Simpson’s Rule)?
  • How can I do all of these formulas quickly?
  • Is there a reason why the Midpoint Rule is better than the Trapezoid Rule?
  • Is there a reason why both the Midpoint Rule and the Trapezoid Rule converge quadratically?
  • Is there a reason why Simpson’s Rule converges like the fourth power of the number of subintervals?
In this series, I hope to answer these questions. While these are standard questions in a introductory college course in numerical analysis, and full and rigorous proofs can be found on Wikipedia and Mathworld, I will approach these questions from the point of view of a bright student who is currently enrolled in calculus and hasn’t yet taken real analysis or numerical analysis.
In the previous post, we showed that the Trapezoid Rule approximation of \displaystyle \int_{x_i}^{x_i+h} x^k \, dx  has error

\displaystyle \frac{k(k-1)}{12} x_i^{k-2} h^3 + O(h^4)

In this post, we consider the global error when integrating on the interval [a,b] instead of a subinterval [x_i,x_i+h]. The logic is almost a perfect copy-and-paste from the analysis used for the Midpoint Rule. The total error when approximating \displaystyle \int_a^b x^k \, dx = \int_{x_0}^{x_n} x^k \, dx will be the sum of the errors for the integrals over [x_0,x_1], [x_1,x_2], through [x_{n-1},x_n]. Therefore, the total error will be

E \approx \displaystyle \frac{k(k-1)}{12} \left(x_0^{k-2} + x_1^{k-2} + \dots + x_{n-1}^{k-2} \right) h^3.

So that this formula doesn’t appear completely mystical, this actually matches the numerical observations that we made earlier. The figure below shows the left-endpoint approximations to \displaystyle \int_1^2 x^9 \, dx for different numbers of subintervals. If we take n = 100 and h = 0.01, then the error should be approximately equal to

\displaystyle \frac{9 \times 8}{12} \left(1^7 + 1.01^7 + \dots + 1.99^7 \right) (0.01)^3 \approx 0.0187462,

which, as expected, is close to the actual error of 102.3191246 - 102.3 \approx 0.0191246.
Let y_i = x_i^{k-2}, so that the error becomes

E \approx \displaystyle \frac{k(k-1)}{12} \left(y_0 + y_1 + \dots + y_{n-1} \right) h^3 + O(h^4) = \displaystyle \frac{k(k-1)}{12} \overline{y} n h^3,

where \overline{y} = (y_0 + y_1 + \dots + y_{n-1})/n is the average of the y_i. Clearly, this average is somewhere between the smallest and the largest of the y_i. Since y = x^{k-2} is a continuous function, that means that there must be some value of x_* between x_0 and x_{k-1} — and therefore between a and b — so that x_*^{k-2} = \overline{y} by the Intermediate Value Theorem. We conclude that the error can be written as

E \approx \displaystyle \frac{k(k-1)}{12} x_*^{k-2} nh^3,

Finally, since h is the length of one subinterval, we see that nh = b-a is the total length of the interval [a,b]. Therefore,

E \approx \displaystyle \frac{k(k-1)}{12} x_*^{k-2} (b-a)h^2 \equiv ch^2,

where the constant c is determined by a, b, and k. In other words, for the special case f(x) = x^k, we have established that the error from the Trapezoid Rule is approximately quadratic in h — without resorting to the generalized mean-value theorem and confirming the numerical observations we made earlier.

Thoughts on Numerical Integration (Part 18): Trapezoid rule and local rate of convergence

Numerical integration is a standard topic in first-semester calculus. From time to time, I have received questions from students on various aspects of this topic, including:
  • Why is numerical integration necessary in the first place?
  • Where do these formulas come from (especially Simpson’s Rule)?
  • How can I do all of these formulas quickly?
  • Is there a reason why the Midpoint Rule is better than the Trapezoid Rule?
  • Is there a reason why both the Midpoint Rule and the Trapezoid Rule converge quadratically?
  • Is there a reason why Simpson’s Rule converges like the fourth power of the number of subintervals?
In this series, I hope to answer these questions. While these are standard questions in a introductory college course in numerical analysis, and full and rigorous proofs can be found on Wikipedia and Mathworld, I will approach these questions from the point of view of a bright student who is currently enrolled in calculus and hasn’t yet taken real analysis or numerical analysis.
In this post, we will perform an error analysis for the Trapezoid Rule

\int_a^b f(x) \, dx \approx \frac{h}{2} \left[f(x_0) + 2f(x_1) + 2f(x_2) + \dots + 2f(x_{n-1}) +f(x_n) \right] \equiv T_n

where n is the number of subintervals and h = (b-a)/n is the width of each subinterval, so that x_k = x_0 + kh.
As noted above, a true exploration of error analysis requires the generalized mean-value theorem, which perhaps a bit much for a talented high school student learning about this technique for the first time. That said, the ideas behind the proof are accessible to high school students, using only ideas from the secondary curriculum (especially the Binomial Theorem), if we restrict our attention to the special case f(x) = x^k, where k \ge 5 is a positive integer.

For this special case, the true area under the curve f(x) = x^k on the subinterval [x_i, x_i +h] will be

\displaystyle \int_{x_i}^{x_i+h} x^k \, dx = \frac{1}{k+1} \left[ (x_i+h)^{k+1} - x_i^{k+1} \right]

= \displaystyle \frac{1}{k+1} \left[x_i^{k+1} + {k+1 \choose 1} x_i^k h + {k+1 \choose 2} x_i^{k-1} h^2 + {k+1 \choose 3} x_i^{k-2} h^3 + {k+1 \choose 4} x_i^{k-3} h^4+ {k+1 \choose 5} x_i^{k-4} h^5+ O(h^6) - x_i^{k+1} \right]

= \displaystyle \frac{1}{k+1} \bigg[ (k+1) x_i^k h + \frac{(k+1)k}{2} x_i^{k-1} h^2 + \frac{(k+1)k(k-1)}{6} x_i^{k-2} h^3+ \frac{(k+1)k(k-1)(k-2)}{24} x_i^{k-3} h^4

+ \displaystyle \frac{(k+1)k(k-1)(k-2)(k-3)}{120} x_i^{k-4} h^5 \bigg] + O(h^6)

= x_i^k h + \displaystyle \frac{k}{2} x_i^{k-1} h^2 + \frac{k(k-1)}{6} x_i^{k-2} h^3 + \frac{k(k-1)(k-2)}{24} x_i^{k-3} h^4 + \frac{k(k-1)(k-2)(k-3)}{120} x_i^{k-4} h^5 + O(h^6)

In the above, the shorthand O(h^6) can be formally defined, but here we’ll just take it to mean “terms that have a factor of h^6 or higher that we’re too lazy to write out.” Since h is supposed to be a small number, these terms will small in magnitude and thus can be safely ignored. I wrote the above formula to include terms up to and including h^5 because I’ll need this later in this series of posts. For now, looking only at the Trapezoid Rule, it will suffice to write this integral as

\displaystyle \int_{x_i}^{x_i+h} x^k \, dx =x_i^k h + \displaystyle \frac{k}{2} x_i^{k-1} h^2 + \frac{k(k-1)}{6} x_i^{k-2} h^3 + O(h^4).

Using the Trapezoid Rule, we approximate \displaystyle \int_{x_i}^{x_i+h} x^k \, dx as \displaystyle \frac{h}{2} \left[x_i^k + (x_i + h)^k \right], using the width h and the bases x_i^k and (x_i + h)^k of the trapezoid. Using the Binomial Theorem, this expands as

 x_i^k h + \displaystyle {k \choose 1} x_i^{k-1} \frac{h^2}{2}  + {k \choose 2} x_i^{k-2} \frac{h^3}{2} + {k \choose 3} x_i^{k-3} \frac{h^4}{2}  + {k \choose 4} x_i^{k-4} \frac{h^5}{2} + O(h^6)

 = x_i^k h + \displaystyle \frac{k}{2} x_i^{k-1} h^2  + \frac{k(k-1)}{4} x_i^{k-2} h^3 + \frac{k(k-1)(k-2)}{12} x_i^{k-3} h^4

\displaystyle + \frac{k(k-1)(k-2)(k-3)}{48} x_i^{k-4} h^5 + O(h^6)

Once again, this is a little bit overkill for the present purposes, but we’ll need this formula later in this series of posts. Truncating somewhat earlier, we find that the Trapezoid Rule for this subinterval gives

x_i^k h + \displaystyle \frac{k}{2} x_i^{k-1} h^2  + \displaystyle \frac{k(k-1)}{4} x_i^{k-2} h^3 + O(h^4)

Subtracting from the actual integral, the error in this approximation will be equal to

\displaystyle x_i^k h + \frac{k}{2} x_i^{k-1} h^2 + \frac{k(k-1)}{6} x_i^{k-2} h^3 - x_i^k h - \frac{k}{2} x_i^{k-1} h^2  - \frac{k(k-1)}{4} x_i^{k-2} h^3 + O(h^4)

= \displaystyle \frac{k(k-1)}{12} x_i^{k-2} h^3 + O(h^4)

In other words, like the Midpoint Rule, both of the first two terms x_i^k h and \displaystyle \frac{k}{2} x_i^{k-1} h^2 cancel perfectly, leaving us with a local error on the order of h^3. We also recall, from the previous post in this series that the local error from the Midpoint Rule was \displaystyle \frac{k(k-1)}{24} x_i^{k-2} h^3 + O(h^4). In other words, while both the Midpoint Rule and Trapezoid Rule have local errors on the order of O(h^3), we expect the error in the Midpoint Rule to be about half of the error from the Trapezoid Rule.

Thoughts on Numerical Integration (Part 17): Midpoint rule and global rate of convergence

Numerical integration is a standard topic in first-semester calculus. From time to time, I have received questions from students on various aspects of this topic, including:
  • Why is numerical integration necessary in the first place?
  • Where do these formulas come from (especially Simpson’s Rule)?
  • How can I do all of these formulas quickly?
  • Is there a reason why the Midpoint Rule is better than the Trapezoid Rule?
  • Is there a reason why both the Midpoint Rule and the Trapezoid Rule converge quadratically?
  • Is there a reason why Simpson’s Rule converges like the fourth power of the number of subintervals?
In this series, I hope to answer these questions. While these are standard questions in a introductory college course in numerical analysis, and full and rigorous proofs can be found on Wikipedia and Mathworld, I will approach these questions from the point of view of a bright student who is currently enrolled in calculus and hasn’t yet taken real analysis or numerical analysis.
In the previous post, we showed that the midpoint approximation of \displaystyle \int_{x_i}^{x_i+h} x^k \, dx  has error

= \displaystyle \frac{k(k-1)}{24} x_i^{k-2} h^3 + O(h^4)

In this post, we consider the global error when integrating on the interval [a,b] instead of a subinterval [x_i,x_i+h]. The logic for determining the global error is much the same as what we used earlier for the left-endpoint rule. The total error when approximating \displaystyle \int_a^b x^k \, dx = \int_{x_0}^{x_n} x^k \, dx will be the sum of the errors for the integrals over [x_0,x_1], [x_1,x_2], through [x_{n-1},x_n]. Therefore, the total error will be

E \approx \displaystyle \frac{k(k-1)}{24} \left(x_0^{k-2} + x_1^{k-2} + \dots + x_{n-1}^{k-2} \right) h^3.

So that this formula doesn’t appear completely mystical, this actually matches the numerical observations that we made earlier. The figure below shows the left-endpoint approximations to \displaystyle \int_1^2 x^9 \, dx for different numbers of subintervals. If we take n = 100 and h = 0.01, then the error should be approximately equal to

\displaystyle \frac{9 \times 8}{24} \left(1^7 + 1.01^7 + \dots + 1.99^7 \right) (0.01)^3 \approx 0.0093731,

which, as expected, is close to the actual error of 102.3 - 102.2904379 \approx 0.00956211.
Let y_i = x_i^{k-2}, so that the error becomes

E \approx \displaystyle \frac{k(k-1)}{24} \left(y_0 + y_1 + \dots + y_{n-1} \right) h^3 + O(h^4) = \displaystyle \frac{k(k-1)}{24} \overline{y} n h^3,

where \overline{y} = (y_0 + y_1 + \dots + y_{n-1})/n is the average of the y_i. Clearly, this average is somewhere between the smallest and the largest of the y_i. Since y = x^{k-2} is a continuous function, that means that there must be some value of x_* between x_0 and x_{k-1} — and therefore between a and b — so that x_*^{k-2} = \overline{y} by the Intermediate Value Theorem. We conclude that the error can be written as

E \approx \displaystyle \frac{k(k-1)}{24} x_*^{k-2} nh^3,

Finally, since h is the length of one subinterval, we see that nh = b-a is the total length of the interval [a,b]. Therefore,

E \approx \displaystyle \frac{k(k-1)}{24} x_*^{k-2} (b-a)h^2 \equiv ch^2,

where the constant c is determined by a, b, and k. In other words, for the special case f(x) = x^k, we have established that the error from the Midpoint Rule is approximately quadratic in h — without resorting to the generalized mean-value theorem and confirming the numerical observations we made earlier.

Thoughts on Numerical Integration (Part 16): Midpoint rule and local rate of convergence

Numerical integration is a standard topic in first-semester calculus. From time to time, I have received questions from students on various aspects of this topic, including:
  • Why is numerical integration necessary in the first place?
  • Where do these formulas come from (especially Simpson’s Rule)?
  • How can I do all of these formulas quickly?
  • Is there a reason why the Midpoint Rule is better than the Trapezoid Rule?
  • Is there a reason why both the Midpoint Rule and the Trapezoid Rule converge quadratically?
  • Is there a reason why Simpson’s Rule converges like the fourth power of the number of subintervals?
In this series, I hope to answer these questions. While these are standard questions in a introductory college course in numerical analysis, and full and rigorous proofs can be found on Wikipedia and Mathworld, I will approach these questions from the point of view of a bright student who is currently enrolled in calculus and hasn’t yet taken real analysis or numerical analysis.
In this post, we will perform an error analysis for the Midpoint Rule

\int_a^b f(x) \, dx \approx h \left[f(c_1) + f(c_2) + \dots + f(c_n) \right] \equiv M_n

where n is the number of subintervals and h = (b-a)/n is the width of each subinterval, so that x_k = x_0 + kh. Also, c_i = (x_{i-1} + x_i)/2 is the midpoint of the ith subinterval.
As noted above, a true exploration of error analysis requires the generalized mean-value theorem, which perhaps a bit much for a talented high school student learning about this technique for the first time. That said, the ideas behind the proof are accessible to high school students, using only ideas from the secondary curriculum (especially the Binomial Theorem), if we restrict our attention to the special case f(x) = x^k, where k \ge 5 is a positive integer.

For this special case, the true area under the curve f(x) = x^k on the subinterval [x_i, x_i +h] will be

\displaystyle \int_{x_i}^{x_i+h} x^k \, dx = \frac{1}{k+1} \left[ (x_i+h)^{k+1} - x_i^{k+1} \right]

= \displaystyle \frac{1}{k+1} \left[x_i^{k+1} + {k+1 \choose 1} x_i^k h + {k+1 \choose 2} x_i^{k-1} h^2 + {k+1 \choose 3} x_i^{k-2} h^3 + {k+1 \choose 4} x_i^{k-3} h^4+ {k+1 \choose 5} x_i^{k-4} h^5+ O(h^6) - x_i^{k+1} \right]

= \displaystyle \frac{1}{k+1} \bigg[ (k+1) x_i^k h + \frac{(k+1)k}{2} x_i^{k-1} h^2 + \frac{(k+1)k(k-1)}{6} x_i^{k-2} h^3+ \frac{(k+1)k(k-1)(k-2)}{24} x_i^{k-3} h^4

+ \displaystyle \frac{(k+1)k(k-1)(k-2)(k-3)}{120} x_i^{k-4} h^5 \bigg] + O(h^6)

= x_i^k h + \displaystyle \frac{k}{2} x_i^{k-1} h^2 + \frac{k(k-1)}{6} x_i^{k-2} h^3 + \frac{k(k-1)(k-2)}{24} x_i^{k-3} h^4 + \frac{k(k-1)(k-2)(k-3)}{120} x_i^{k-4} h^5 + O(h^6)

In the above, the shorthand O(h^6) can be formally defined, but here we’ll just take it to mean “terms that have a factor of h^6 or higher that we’re too lazy to write out.” Since h is supposed to be a small number, these terms will small in magnitude and thus can be safely ignored. I wrote the above formula to include terms up to and including h^5 because I’ll need this later in this series of posts. For now, looking only at the Midpoint Rule, it will suffice to write this integral as

\displaystyle \int_{x_i}^{x_i+h} x^k \, dx =x_i^k h + \displaystyle \frac{k}{2} x_i^{k-1} h^2 + \frac{k(k-1)}{6} x_i^{k-2} h^3 + O(h^4).

Using the midpoint of the subinterval, the left-endpoint approximation of \displaystyle \int_{x_i}^{x_i+h} x^k \, dx is \displaystyle \left(x_i+ \frac{h}{2} \right)^k h. Using the Binomial Theorem, this expands as

 x_i^k h + \displaystyle {k \choose 1} x_i^{k-1} \frac{h^2}{2}  + {k \choose 2} x_i^{k-2} \frac{h^3}{4} + {k \choose 3} x_i^{k-3} \frac{h^4}{8}  + {k \choose 4} x_i^{k-4} \frac{h^5}{16} + O(h^6)

 = x_i^k h + \displaystyle \frac{k}{2} x_i^{k-1} h^2  + \frac{k(k-1)}{8} x_i^{k-2} h^3 + \frac{k(k-1)(k-2)}{48} x_i^{k-3} h^4

\displaystyle + \frac{k(k-1)(k-2)(k-3)}{384} x_i^{k-4} h^5 + O(h^6)

Once again, this is a little bit overkill for the present purposes, but we’ll need this formula later in this series of posts. Truncating somewhat earlier, we find that the Midpoint Rule for this subinterval gives

x_i^k h + \displaystyle \frac{k}{2} x_i^{k-1} h^2  + \displaystyle \frac{k(k-1)}{8} x_i^{k-2} h^3 + O(h^4)

Subtracting from the actual integral, the error in this approximation will be equal to

\displaystyle x_i^k h + \frac{k}{2} x_i^{k-1} h^2 + \frac{k(k-1)}{6} x_i^{k-2} h^3 - x_i^k h - \frac{k}{2} x_i^{k-1} h^2  - \frac{k(k-1)}{8} x_i^{k-2} h^3 + O(h^4)

= \displaystyle \frac{k(k-1)}{24} x_i^{k-2} h^3 + O(h^4)

In other words, unlike the left-endpoint and right-endpoint approximations, both of the first two terms x_i^k h and \displaystyle \frac{k}{2} x_i^{k-1} h^2 cancel perfectly, leaving us with a local error on the order of h^3.
The logic for determining the global error is much the same as what we used earlier for the left-endpoint rule. The total error when approximating \displaystyle \int_a^b x^k \, dx = \int_{x_0}^{x_n} x^k \, dx will be the sum of the errors for the integrals over [x_0,x_1], [x_1,x_2], through [x_{n-1},x_n]. Therefore, the total error will be

E \approx \displaystyle \frac{k(k-1)}{24} \left(x_0^{k-2} + x_1^{k-2} + \dots + x_{n-1}^{k-2} \right) h^3.

So that this formula doesn’t appear completely mystical, this actually matches the numerical observations that we made earlier. The figure below shows the left-endpoint approximations to \displaystyle \int_1^2 x^9 \, dx for different numbers of subintervals. If we take n = 100 and h = 0.01, then the error should be approximately equal to

\displaystyle \frac{9 \times 8}{24} \left(1^7 + 1.01^7 + \dots + 1.99^7 \right) (0.01)^3 \approx 0.0093731,

which, as expected, is close to the actual error of 102.3 - 102.2904379 \approx 0.00956211.
Let y_i = x_i^{k-2}, so that the error becomes

E \approx \displaystyle \frac{k(k-1)}{24} \left(y_0 + y_1 + \dots + y_{n-1} \right) h^3 + O(h^4) = \displaystyle \frac{k(k-1)}{24} \overline{y} n h^3,

where \overline{y} = (y_0 + y_1 + \dots + y_{n-1})/n is the average of the y_i. Clearly, this average is somewhere between the smallest and the largest of the y_i. Since y = x^{k-2} is a continuous function, that means that there must be some value of x_* between x_0 and x_{k-1} — and therefore between a and b — so that x_*^{k-2} = \overline{y} by the Intermediate Value Theorem. We conclude that the error can be written as

E \approx \displaystyle \frac{k(k-1)}{24} x_*^{k-2} nh^3,

Finally, since h is the length of one subinterval, we see that nh = b-a is the total length of the interval [a,b]. Therefore,

E \approx \displaystyle \frac{k(k-1)}{24} x_*^{k-2} (b-a)h^2 \equiv ch^2,

where the constant c is determined by a, b, and k. In other words, for the special case f(x) = x^k, we have established that the error from the Midpoint Rule is approximately quadratic in h — without resorting to the generalized mean-value theorem and confirming the numerical observations we made earlier.

Thoughts on Numerical Integration (Part 15): Right endpoint rule and global rate of convergence

Numerical integration is a standard topic in first-semester calculus. From time to time, I have received questions from students on various aspects of this topic, including:
  • Why is numerical integration necessary in the first place?
  • Where do these formulas come from (especially Simpson’s Rule)?
  • How can I do all of these formulas quickly?
  • Is there a reason why the Midpoint Rule is better than the Trapezoid Rule?
  • Is there a reason why both the Midpoint Rule and the Trapezoid Rule converge quadratically?
  • Is there a reason why Simpson’s Rule converges like the fourth power of the number of subintervals?
In this series, I hope to answer these questions. While these are standard questions in a introductory college course in numerical analysis, and full and rigorous proofs can be found on Wikipedia and Mathworld, I will approach these questions from the point of view of a bright student who is currently enrolled in calculus and hasn’t yet taken real analysis or numerical analysis.
In the previous post in this series, we found that the local error of the right endpoint approximation to \displaystyle \int_{x_i}^{x_i+h} x^k \, dx was equal to 

\displaystyle \frac{k}{2} x_i^{k-1} h^2 + O(h^3).

We now consider the global error when integrating over the interval [a,b] and not just a particular subinterval.
The total error when approximating \displaystyle \int_a^b x^k \, dx = \int_{x_0}^{x_n} x^k , dx will be the sum of the errors for the integrals over [x_0,x_1], [x_1,x_2], through [x_{n-1},x_n]. Therefore, the total error will be

E \approx \displaystyle \frac{k}{2} \left(x_1^{k-1} + x_2^{k-1} + \dots + x_{n}^{k-1} \right) h^2.

So that this formula doesn’t appear completely mystical, this actually matches the numerical observations that we made earlier. The figure below shows the left-endpoint approximations to \displaystyle \int_1^2 x^9 \, dx for different numbers of subintervals. If we take n = 100 and h = 0.01, then the error should be approximately equal to

\displaystyle \frac{9}{2} \left(1.01^8 + 1.02^8 + \dots + 2^8 \right) (0.01)^2 \approx 2.61276,

which, as expected, is close to the actual error of 104.8741246 - 102.3 \approx 2.57412.
We now perform a more detailed analysis of the global error, which is almost a perfect copy-and-paste from the previous analysis. Let y_i = x_i^{k-1}, so that the error becomes

E \approx \displaystyle \frac{k}{2} \left(y_1 + y_2 + \dots + y_n \right) h^2 + O(h^3) = \displaystyle \frac{k}{2} \overline{y} n h^2,

where \overline{y} = (y_1 + y_2 + \dots + y_{n})/n is the average of the y_i. Clearly, this average is somewhere between the smallest and the largest of the y_i. Since y = x^{k-1} is a continuous function, that means that there must be some value of x_* between x_1 and x_{n} — and therefore between a and b — so that x_*^{k-1} = \overline{y} by the Intermediate Value Theorem. We conclude that the error can be written as

E \approx \displaystyle \frac{k}{2} x_*^{k-1} nh^2,

Finally, since h is the length of one subinterval, we see that nh = b-a is the total length of the interval [a,b]. Therefore,

E \approx \displaystyle \frac{k}{2} x_*^{k-1} (b-a)h \equiv ch,

where the constant c is determined by a, b, and k. In other words, for the special case f(x) = x^k, we have established that the error from the left-endpoint rule is approximately linear in h — without resorting to the generalized mean-value theorem.

Thoughts on Numerical Integration (Part 14): Right endpoint rule and local rate of convergence

Numerical integration is a standard topic in first-semester calculus. From time to time, I have received questions from students on various aspects of this topic, including:

  • Why is numerical integration necessary in the first place?
  • Where do these formulas come from (especially Simpson’s Rule)?
  • How can I do all of these formulas quickly?
  • Is there a reason why the Midpoint Rule is better than the Trapezoid Rule?
  • Is there a reason why both the Midpoint Rule and the Trapezoid Rule converge quadratically?
  • Is there a reason why Simpson’s Rule converges like the fourth power of the number of subintervals?

In this series, I hope to answer these questions. While these are standard questions in a introductory college course in numerical analysis, and full and rigorous proofs can be found on Wikipedia and Mathworld, I will approach these questions from the point of view of a bright student who is currently enrolled in calculus and hasn’t yet taken real analysis or numerical analysis.

In this post, we will perform an error analysis for the right-endpoint rule

\int_a^b f(x) \, dx \approx h \left[f(x_1) + f(x_2) + \dots + f(x_n) \right] \equiv R_n

where n is the number of subintervals and h = (b-a)/n is the width of each subinterval, so that x_k = x_0 + kh.

As noted above, a true exploration of error analysis requires the generalized mean-value theorem, which perhaps a bit much for a talented high school student learning about this technique for the first time. That said, the ideas behind the proof are accessible to high school students, using only ideas from the secondary curriculum, if we restrict our attention to the special case f(x) = x^k, where k \ge 5 is a positive integer.

For this special case, the true area under the curve $f(x) = x^k$ on the subinterval [x_i, x_i +h] will be

\displaystyle \int_{x_i}^{x_i+h} x^k \, dx = \frac{1}{k+1} \left[ (x_i+h)^{k+1} - x_i^{k+1} \right]

= \displaystyle \frac{1}{k+1} \left[x_i^{k+1} + {k+1 \choose 1} x_i^k h + {k+1 \choose 2} x_i^{k-1} h^2 + O(h^3) - x_i^{k+1} \right]

= \displaystyle \frac{1}{k+1} \left[ (k+1) x_i^k h + \frac{(k+1)k}{2} x_i^{k-1} h^2 + O(h^3) \right]

= x_i^k h + \displaystyle \frac{k}{2} x_i^{k-1} h^2 + O(h^3)

In the above, the shorthand O(h^3) can be formally defined, but here we’ll just take it to mean “terms that have a factor of h^3 or higher that we’re too lazy to write out.” Since h is supposed to be a small number, these terms will be much smaller in magnitude that the terms that have h or h^2 and thus can be safely ignored.

Using only the right-endpoint of the subinterval, the left-endpoint approximation of \displaystyle \int_{x_i}^{x_i+h} x^k \, dx is

(x_i+h)^k h = x_i^k h + k x_i^{k-1} h^2 + O(h^3).

Subtracting, the error in this approximation will be equal to

\displaystyle x_i^k h + k x_i^{k-1} h^2 - x_i^k h - \frac{k}{2} x_i^{k-1} h^2 + O(h^3) = \displaystyle \frac{k}{2} x_i^{k-1} h^2 + O(h^3)

Repeating the logic from the previous post in this series, this local error on [x_i, x_i+h], which is proportional to O(h^2), generates a total error on [a,b] that is proportional to h. That is, the right-endpoint rule has an error that is approximately linear in h, confirming the numerical observation that we made earlier in this series.

Thoughts on Numerical Integration (Part 13): Left endpoint rule and global rate of convergence

Numerical integration is a standard topic in first-semester calculus. From time to time, I have received questions from students on various aspects of this topic, including:
  • Why is numerical integration necessary in the first place?
  • Where do these formulas come from (especially Simpson’s Rule)?
  • How can I do all of these formulas quickly?
  • Is there a reason why the Midpoint Rule is better than the Trapezoid Rule?
  • Is there a reason why both the Midpoint Rule and the Trapezoid Rule converge quadratically?
  • Is there a reason why Simpson’s Rule converges like the fourth power of the number of subintervals?
In this series, I hope to answer these questions. While these are standard questions in a introductory college course in numerical analysis, and full and rigorous proofs can be found on Wikipedia and Mathworld, I will approach these questions from the point of view of a bright student who is currently enrolled in calculus and hasn’t yet taken real analysis or numerical analysis.
In the previous post in this series, we found that the local error of the left endpoint approximation to \displaystyle \int_{x_i}^{x_i+h} x^k \, dx was equal to 

\displaystyle \frac{k}{2} x_i^{k-1} h^2 + O(h^3).

We now consider the global error when integrating over the interval [a,b] and not just a particular subinterval.
The total error when approximating \displaystyle \int_a^b x^k \, dx = \int_{x_0}^{x_n} x^k , dx will be the sum of the errors for the integrals over [x_0,x_1], [x_1,x_2], through [x_{n-1},x_n]. Therefore, the total error will be

E \approx \displaystyle \frac{k}{2} \left(x_0^{k-1} + x_1^{k-1} + \dots + x_{n-1}^{k-1} \right) h^2.

So that this formula doesn’t appear completely mystical, this actually matches the numerical observations that we made earlier. The figure below shows the left-endpoint approximations to \displaystyle \int_1^2 x^9 \, dx for different numbers of subintervals. If we take n = 100 and h = 0.01, then the error should be approximately equal to

\displaystyle \frac{9}{2} \left(1^8 + 1.01^8 + \dots + 1.99^8 \right) (0.01)^2 \approx 2.49801,

which, as expected, is close to the actual error of 102.3 - 99.76412456 \approx 2.53588.
We now perform a more detailed analysis of the global error. Let y_i = x_i^{k-1}, so that the error becomes

E \approx \displaystyle \frac{k}{2} \left(y_0 + y_1 + \dots + y_{n-1} \right) h^2 + O(h^3) = \displaystyle \frac{k}{2} \overline{y} n h^2,

where \overline{y} = (y_0 + y_1 + \dots + y_{n-1})/n is the average of the y_i. Clearly, this average is somewhere between the smallest and the largest of the y_i. Since y = x^{k-1} is a continuous function, that means that there must be some value of x_* between x_0 and x_{n-1} — and therefore between a and b — so that x_*^{k-1} = \overline{y} by the Intermediate Value Theorem. We conclude that the error can be written as

E \approx \displaystyle \frac{k}{2} x_*^{k-1} nh^2,

Finally, since h is the length of one subinterval, we see that nh = b-a is the total length of the interval [a,b]. Therefore,

E \approx \displaystyle \frac{k}{2} x_*^{k-1} (b-a)h \equiv ch,

where the constant c is determined by a, b, and k. In other words, for the special case f(x) = x^k, we have established that the error from the left-endpoint rule is approximately linear in h — without resorting to the generalized mean-value theorem.

Thoughts on Numerical Integration (Part 12): Left endpoint rule and local rate of convergence

Numerical integration is a standard topic in first-semester calculus. From time to time, I have received questions from students on various aspects of this topic, including:

  • Why is numerical integration necessary in the first place?
  • Where do these formulas come from (especially Simpson’s Rule)?
  • How can I do all of these formulas quickly?
  • Is there a reason why the Midpoint Rule is better than the Trapezoid Rule?
  • Is there a reason why both the Midpoint Rule and the Trapezoid Rule converge quadratically?
  • Is there a reason why Simpson’s Rule converges like the fourth power of the number of subintervals?

In this series, I hope to answer these questions. While these are standard questions in a introductory college course in numerical analysis, and full and rigorous proofs can be found on Wikipedia and Mathworld, I will approach these questions from the point of view of a bright student who is currently enrolled in calculus and hasn’t yet taken real analysis or numerical analysis.

In this post, we will perform an error analysis for the left-endpoint rule

\int_a^b f(x) \, dx \approx h \left[f(x_0) + f(x_1) + \dots + f(x_{n-1}) \right] \equiv L_n

where n is the number of subintervals and h = (b-a)/n is the width of each subinterval, so that x_k = x_0 + kh.

As noted above, a true exploration of error analysis requires the generalized mean-value theorem, which perhaps a bit much for a talented high school student learning about this technique for the first time. That said, the ideas behind the proof are accessible to high school students, using only ideas from the secondary curriculum, if we restrict our attention to the special case f(x) = x^k, where k \ge 5 is a positive integer.

For this special case, the true area under the curve $f(x) = x^k$ on the subinterval [x_i, x_i +h] will be

\displaystyle \int_{x_i}^{x_i+h} x^k \, dx = \frac{1}{k+1} \left[ (x_i+h)^{k+1} - x_i^{k+1} \right]

= \displaystyle \frac{1}{k+1} \left[x_i^{k+1} + {k+1 \choose 1} x_i^k h + {k+1 \choose 2} x_i^{k-1} h^2 + O(h^3) - x_i^{k+1} \right]

= \displaystyle \frac{1}{k+1} \left[ (k+1) x_i^k h + \frac{(k+1)k}{2} x_i^{k-1} h^2 + O(h^3) \right]

= x_i^k h + \displaystyle \frac{k}{2} x_i^{k-1} h^2 + O(h^3)

In the above, the shorthand O(h^3) can be formally defined, but here we’ll just take it to mean “terms that have a factor of h^3 or higher that we’re too lazy to write out.” Since h is supposed to be a small number, these terms will be much smaller in magnitude that the terms that have h or h^2 and thus can be safely ignored.

Using only the left-endpoint of the subinterval, the left-endpoint approximation of \displaystyle \int_{x_i}^{x_i+h} x^k \, dx is x_i^k h. Therefore, the error in this approximation will be equal to

\displaystyle \frac{k}{2} x_i^{k-1} h^2 + O(h^3).

In the next post of this series, we’ll show that the global error when integrating between a and b — as opposed to between x_i and x_i + h — is approximately linear in h.

Thoughts on Numerical Integration (Part 11): Simpson’s Rule and exploration of error analysis

Numerical integration is a standard topic in first-semester calculus. From time to time, I have received questions from students on various aspects of this topic, including:

  • Why is numerical integration necessary in the first place?
  • Where do these formulas come from (especially Simpson’s Rule)?
  • How can I do all of these formulas quickly?
  • Is there a reason why the Midpoint Rule is better than the Trapezoid Rule?
  • Is there a reason why both the Midpoint Rule and the Trapezoid Rule converge quadratically?
  • Is there a reason why Simpson’s Rule converges like the fourth power of the number of subintervals?

In this series, I hope to answer these questions. While these are standard questions in a introductory college course in numerical analysis, and full and rigorous proofs can be found on Wikipedia and Mathworld, I will approach these questions from the point of view of a bright student who is currently enrolled in calculus and hasn’t yet taken real analysis or numerical analysis.

In the previous post in this series, I discussed three different ways of numerically approximating the definite integral \displaystyle \int_a^b f(x) \, dx, the area under a curve f(x) between x=a and x=b.

In this series, we’ll choose equal-sized subintervals of the interval [a,b]. If h = (b-a)/n is the width of each subinterval so that x_k = x_0 + kh, then the integral may be approximated as

\int_a^b f(x) \, dx \approx h \left[f(x_0) + f(x_1) + \dots + f(x_{n-1}) \right] \equiv L_n

using left endpoints,

\int_a^b f(x) \, dx \approx h \left[f(x_1) + f(x_2) + \dots + f(x_n) \right] \equiv R_n

using right endpoints, and

\int_a^b f(x) \, dx \approx h \left[f(c_1) + f(c_2) + \dots + f(c_n) \right] \equiv M_n

using the midpoints of the subintervals. We have also derived the Trapezoid Rule

\int_a^b f(x) \, dx \approx \displaystyle \frac{h}{2} [f(x_0) + 2f(x_1) + \dots + 2f(x_{n-1}) + f(x_n)] \equiv T_n

and Simpson’s Rule (if n is even)

\int_a^b f(x) \, dx \approx \displaystyle \frac{h}{3} \left[y_0 + 4 y_1 + 2 y_2 + 4 y_3 + \dots + 2y_{n-2} + 4 y_{n-1} +  y_{n} \right] \equiv S_n.

In the previous post in this series, we saw that both the left-endpoint and right-endpoint rules have a linear rate of convergence: if twice as many subintervals are taken, then the error appears to go down by a factor of 2. If ten times as many subintervals are used, then the error should go down by a factor of 10. However, both the Midpoint Rule and the Trapezoid Rule have a quadratic rate of convergence: if twice as many subintervals are taken, then the error appears to go down by a factor of 4. If ten times as many subintervals are used, then the error should go down by a factor of 100. Moreover, it appears that the error from the Midpoint Rule is about half that of the Trapezoid Rule if the same number of subintervals are used.

Let’s now explore the results of Simpson’s Rule applied to \displaystyle \int_1^2 x^9 \, dx = 102.3 using different numbers of subintervals. The results are summarized in the table below.

 

The first immediate observation is that these approximations are far better than even the Midpoint and Trapezoid Rules! Indeed, we see that S_{20} \approx 102.301, using only 20 subintervals, is a better approximation than (from previous posts) either M_{100} = 102.290 or T_{100} \approx 102.319 using 100 subintervals! The lesson to learn again: Simpson’s Rule is a bit more difficult to compute than the Trapezoid Rule or the Midpoint Rule because of the different weights. Nevertheless, choosing a good algorithm is often far better than simply doing lots of computations.

There’s a second observation: the rate of convergence appears to be much, much faster. Indeed, the data points appear to fit a quartic polynomial very well. Said another way, if twice as many subintervals are taken, then the error appears to go down by a factor of 16. We can actually see this in the figure, looking at the lines with 10, 20, 40, and 80 subintervals.

Error with 10 subintervals = |102.3174904 - 102.3| = 0.0174904.

Error with 20 subintervals = |102.3011002 - 102.3| = 0.0011002.

Error with 40 subintervals = |102.3000689 - 102.3| = 0.0000689.

Error with 80 subintervals = |102.3000043 - 102.3| = 0.0000043.

In all cases, the error on the next line is about the error on the previous line divided by 16.

This illustrates quartic convergence, as opposed to the mere linear convergence of the left- and right-endpoint rules or the quadratic convergence of the Midpoint and Trapezoid Rules.

Thoughts on Numerical Integration (Part 10): Trapezoid Rule and exploration of error analysis

Numerical integration is a standard topic in first-semester calculus. From time to time, I have received questions from students on various aspects of this topic, including:

  • Why is numerical integration necessary in the first place?
  • Where do these formulas come from (especially Simpson’s Rule)?
  • How can I do all of these formulas quickly?
  • Is there a reason why the Midpoint Rule is better than the Trapezoid Rule?
  • Is there a reason why both the Midpoint Rule and the Trapezoid Rule converge quadratically?
  • Is there a reason why Simpson’s Rule converges like the fourth power of the number of subintervals?

In this series, I hope to answer these questions. While these are standard questions in a introductory college course in numerical analysis, and full and rigorous proofs can be found on Wikipedia and Mathworld, I will approach these questions from the point of view of a bright student who is currently enrolled in calculus and hasn’t yet taken real analysis or numerical analysis.

In the previous post in this series, I discussed three different ways of numerically approximating the definite integral \displaystyle \int_a^b f(x) \, dx, the area under a curve f(x) between x=a and x=b.

In this series, we’ll choose equal-sized subintervals of the interval [a,b]. If h = (b-a)/n is the width of each subinterval so that x_k = x_0 + kh, then the integral may be approximated as

\int_a^b f(x) \, dx \approx h \left[f(x_0) + f(x_1) + \dots + f(x_{n-1}) \right] \equiv L_n

using left endpoints,

\int_a^b f(x) \, dx \approx h \left[f(x_1) + f(x_2) + \dots + f(x_n) \right] \equiv R_n

using right endpoints, and

\int_a^b f(x) \, dx \approx h \left[f(c_1) + f(c_2) + \dots + f(c_n) \right] \equiv M_n

using the midpoints of the subintervals. We have also derived the Trapezoid Rule

\int_a^b f(x) \, dx \approx \displaystyle \frac{h}{2} [f(x_0) + 2f(x_1) + \dots + 2f(x_{n-1}) + f(x_n)] \equiv T_n

and Simpson’s Rule (if n is even)

\int_a^b f(x) \, dx \approx \displaystyle \frac{h}{3} \left[y_0 + 4 y_1 + 2 y_2 + 4 y_3 + \dots + 2y_{n-2} + 4 y_{n-1} +  y_{n} \right] \equiv S_n.

In the previous post in this series, we saw that both the left-endpoint and right-endpoint rules have a linear rate of convergence: if twice as many subintervals are taken, then the error appears to go down by a factor of 2. If ten times as many subintervals are used, then the error should go down by a factor of 10. However, the Midpoint Rule has a quadratic rate of convergence: if twice as many subintervals are taken, then the error appears to go down by a factor of 4. If ten times as many subintervals are used, then the error should go down by a factor of 100.

Let’s now explore the results of the Trapezoid Rule applied to \displaystyle \int_1^2 x^9 \, dx = 102.3 using different numbers of subintervals. The results are summarized in the table below.

Once again, the data points fit a quadratic polynomial well, indicating quadratic convergence.

More subtly, it appears that the Trapezoid Rule isn’t quite as good as the Midpoint Rule. Here are the results from the Midpoint Rule (which also appeared in the previous post in this series):

For n = 100 subintervals, the error of the Trapezoid Rule is |102.3191-102.3| = 0.0191, which the error from the Midpoint Rule is |102.2904-102.3| = 0.0096. In other words, while both of these methods are superior to the left- and right-endpoint rules, it appears that the error from the Midpoint Rule is about half of the error from the Trapezoid Rule. The Midpoint Rule appears to be better.

To me, this is far from an obvious conclusion. Geometrically, it’s far from clear that the rectangles from the Midpoint Rule…

… provide a better approximation than using trapezoids …

… yet it appears that’s exactly what happened. This can be rigorously proven, as we’ll explore later in this series.