My Favorite One-Liners: Part 84

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

Every once in a while, I’ll show my students that there’s a difficult way to do a problem that I don’t want them to do for homework. For example, here’s the direct derivation of the mean of the binomial distribution using only Precalculus; this would make an excellent homework problem for the Precalculus teacher who wants to torture his/her students:

E(X) = \displaystyle \sum_{k=0}^n k {n \choose k} p^k q^{n-k}

= \displaystyle \sum_{k=1}^n k  {n \choose k} p^k q^{n-k}

= \displaystyle \sum_{k=1}^n k  \frac{n!}{k!(n-k)!} p^k q^{n-k}

= \displaystyle \sum_{k=1}^n \frac{n!}{(k-1)!(n-k)!} p^k q^{n-k}

= \displaystyle \sum_{k=1}^n \frac{n (n-1)!}{(k-1)!(n-k)!} p^k q^{n-k}

= \displaystyle \sum_{i=0}^{n-1} \frac{n (n-1)!}{i!(n-1-i)!} p^{i+1} q^{n-1-i}

= \displaystyle np \sum_{i=0}^{n-1} \frac{(n-1)!}{i!(n-1-i)!} p^i q^{n-1-i}

= \displaystyle np(p+q)^{n-1}

= np \cdot 1^{n-1}


However, that’s a lot of work, and the way that I really want my students to do this, which is a lot easier (and which will be used throughout the semester), is by writing the binomial random variable as the sum of indicator random variables:

E(X) = E(I_1 + \dots + I_n) = E(I_1) + \dots + E(I_n) = p + \dots + p = np.

So, to reassure my students that they’re going to be asked to reproduce the above lengthy calculation, I’ll tell them that I wrote all that down for my own machismo, just to prove to them that I really could do it.

Since my physical presence exudes next to no machismo, this almost always gets a laugh.

My Favorite One-Liners: Part 43

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them. q Q

Years ago, my first class of students decided to call me “Dr. Q” instead of “Dr. Quintanilla,” and the name has stuck ever since. And I’ll occasionally use this to my advantage when choosing names of variables. For example, here’s a typical proof by induction involving divisibility.

Theorem: If n \ge 1 is a positive integer, then 5^n - 1 is a multiple of 4.

Proof. By induction on n.

n = 1: 5^1 - 1 = 4, which is clearly a multiple of 4.

n: Assume that 5^n - 1 is a multiple of 4.

At this point in the calculation, I ask how I can write this statement as an equation. Eventually, somebody will volunteer that if 5^n-1 is a multiple of 4, then 5^n-1 is equal to 4 times something. At which point, I’ll volunteer:

Yes, so let’s name that something with a variable. Naturally, we should choose something important, something regal, something majestic… so let’s choose the letter q. (Groans and laughter.) It’s good to be the king.

So the proof continues:

n: Assume that 5^n - 1 = 4q, where q is an integer.

n+1. We wish to show that 5^{n+1} - 1 is also a multiple of 4.

At this point, I’ll ask my class how we should write this. Naturally, I give them no choice in the matter:

We wish to show that 5^{n+1} - 1 = 4Q, where Q is some (possibly different) integer.

Then we continue the proof:

5^{n+1} - 1 = 5^n 5^1 - 1

= 5 \times 5^n - 1

= 5 \times (4q + 1) - 1 by the induction hypothesis

= 20q + 5 - 1

= 20q + 4

= 4(5q + 1).

So if we let Q = 5q +1, then 5^{n+1} - 1 = 4Q, where Q is an integer because q is also an integer.


green line

On the flip side of braggadocio, the formula for the binomial distribution is

P(X = k) = \displaystyle {n \choose k} p^k q^{n-k},

where X is the number of successes in n independent and identically distributed trials, where p represents the probability of success on any one trial, and (to my shame) q is the probability of failure.



My Favorite One-Liners: Part 32

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them. Today’s story is a continuation of yesterday’s post. I call today’s one-liner “Method #1… Method #2.”

Every once in a while, I want my students to figure out that there’s a clever way to do a problem that will save them a lot of time, and they need to think of it.

For example, in Algebra II, Precalculus, or Probability, I might introduce the binomial coefficients to my students, show them the formula for computing them and how they’re related to combinatorics and to Pascal’s triangle, and then ask them to compute \displaystyle {100 \choose 3}. We write down

\displaystyle {100 \choose 3} = \displaystyle \frac{100!}{3!(100-3)!} = \displaystyle \frac{100!}{3! \times 97!}

So this fraction needs to be simplified. So I’ll dramatically announce:

Method #1: Multiply out the top and the bottom.

This produces the desired groans from my students. If possible, then I list other available but undesirable ways of solving the problem.

Method #2: Figure out the 100th row of Pascal’s triangle.

Method #3: List out all of the ways of getting 3 successes in 100 trials.

All of this gets the point across: there’s got to be an easier way to do this. So, finally, I’ll get to what I really want my students to do:

Method #4: Write 100! = 100 \times 99 \times 98 \times  97!, and cancel.

The point of this bit of showman’s patter is to get my students to think about what they should do next as opposed to blindly embarking in a laborious calculation.

green line

As another example, consider the following problem from Algebra II/Precalculus: “Show that x-1 is a factor of f(x)=x^{78} - 4 x^{37} + 2 x^{15} + 1.”

As I’m writing down the problem on the board, someone will usually call out nervously, “Are you sure you mean x^{78}?” Yes, I’m sure.

“So,” I announce, “how are we going to solve the problem?”

Method #1: Use synthetic division.

Then I’ll make a point of what it would take to write down the procedure of synthetic division for this polynomial of degree 78.

Method #2: (As my students anticipate the real way of doing the problem) Use long division.

Understanding laughter ensures. Eventually, I tell my students — or, sometimes, my students will tell me:

Method #3: Calculate f(1).


My Favorite One-Liners: Part 31

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

Here’s the closing example that I’ll use when presenting the binomial and hypergeometric distributions to my probability/statistics students.

A lonely bachelor decides to play the field, deciding that a lifetime of watching “Leave It To Beaver” reruns doesn’t sound all that pleasant. On 250 consecutive days, he calls a different woman for a date. Unfortunately, through the school of hard knocks, he knows that the probability that a given woman will accept his gracious invitation is only 1%. What is the chance that he will land at least three dates?

You can probably imagine the stretch I was enduring when I first developed this example many years ago. Nevertheless, I make a point to add the following disclaimer before we start finding the solution, which always gets a laugh:

The events of this exercise are purely fictitious. Any resemblance to any actual persons — living, or dead, or currently speaking — is purely coincidental.

My Favorite One-Liners: Part 30

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them. Today’s quip is a follow-up to yesterday’s post and is one that I’ll use when I need my students to remember something that I taught them earlier in the semester — perhaps even the previous day.

For example, in my applied statistics class, one day I’ll show students how to compute the expected value and the standard deviation of a random variable:

E(X) = \sum x \cdot P(X=x)

E(X^2) = \sum x^2 \cdot P(X=x)

\hbox{SD}(X) = \sqrt{ E(X^2) - [E(X)]^2 }

Then, the next time I meet them, I start working on a seemingly new topic, the derivation of the binomial distribution:

P(X = k) = \displaystyle {n \choose k} p^k q^{n-k}.

This derivation takes some time because I want my students to understand not only how to use the formula but also where the formula comes from. Eventually, I’ll work out that if n = 3 and p = 0.2,

P(X = 0) = 0.512

P(X = 1) = 0.384

P(X = 2) = 0.096

P(X = 3) = 0.008

Then, I announce to my class, I next want to compute E(X) and \hbox{SD}(X). We had just done this the previous class period; however, I know full well that they haven’t yet committed those formulas to memory. So here’s the one-liner that I use: “If you had a good professor, you’d remember how to do this.”

Eventually, when the awkward silence has lasted long enough because no one can remember the formula (without looking back at the previous day’s notes), I plunge an imaginary knife into my heart and turn the imaginary dagger, getting the point across: You really need to remember this stuff.

Tennis and best 2-out-of-3 vs. best 3-out-of-5

I recently read a very interesting article on regarding men’s and women’s tennis that reminded me of the following standard problem in probability.

Player X and Player Y play a series of at most n games, and a winner is declared when either Player X or Player Y wins at least n/2 games. Suppose that the chance that Player X wins is p, and suppose that the outcomes of the games are independent. Find the probability that Player Y wins if (a) n = 3, (b) n = 5.

The easiest way to solve this is to assume that all n games are played, even if that doesn’t actually happen in real life. Then, for part (a), we can use the binomial distribution to find

  • P(X = 0) = P(Y = 3) = (1-p)^3
  • P(X = 1) = P(Y = 2) = 3p(1-p)^2
  • P(X = 2) = P(Y = 1) = 3p^2(1-p)
  • P(X = 3) = P(Y = 0) = p^3

Adding the first two probabilities, the chance that Player Y wins is (1-p)^3 + 3p(1-p)^2 = (1-p)^2 (1+2p).

Similarly, for part (b),

  • P(X = 0) = P(Y = 5) = (1-p)^5
  • P(X = 1) = P(Y = 4) = 5 p (1-p)^4
  • P(X = 2) = P(Y = 3) = 10p^2 (1-p)^3
  • P(X = 3) = P(Y = 2) = 10 p^3 (1-p)^2
  • P(X = 4) = P(Y = 1) = 5 p^4 (1-p)
  • P(X = 5) = P(Y = 0) = p^5

Adding the first three probabilities, the chance that Player Y wins is (1-p)^5 + 5p(1-p)^4 + 10p^2(1-p)^3 = (1-p)^3 (1+3p+6p^2).

The graphs of (1-p)^2 (1+2p) and (1-p)^3 (1+3p+6p^2) on the interval 0.7 \le p \le 0.9 are shown below in blue and orange, respectively. The lesson is clear: if p > 0.5, then clearly the chance that Player Y wins is less than 50%. However, Player Y’s chances of upsetting Player X are greater if they play a best 2-out-of-3 series instead of a best 3-out-of-5 series.

best2outof3Remarkably, this above curve has been observed in real-life sports: namely, women’s tennis (which plays best 2 sets out of 3 — marked WTA below) and men’s tennis (which plays best 3 sets out of 5 in Grand Slams — marked ATP below). The chart indicates that when two men’s players ranked 20 places apart play each other in Grand Slams, an upset occurs about 13% of the time. However, the upset percentage is only 5% in women’s tennis. (This approximately matches the above curve near p = 0.8.)

However, in tennis tournaments that are not Grand Slams, men’s tennis players also play a matches with a maximum of 3 sets. In those tournaments, the chances of an upset are approximately equal in both men’s tennis and women’s tennis.

However, since the casual tennis fan (like me) only tunes into the Grand Slams but not other tennis matches, this fact is not widely known — which gives the misleading impression that top women’s tennis players are not as tough, skilled, etc. as men’s tennis players.

The FiveThirtyEight article argues that top women’s tennis players don’t make it to the latter stages of Grand Slam tournaments than top men’s players because of the two tournaments are held under these different rules, and that women’s tennis would be better served if their matches were also played in a best-3-out-of-5 format.



New England Patriots Cheat At the Pre-Game Coin Flip? Not Really.

Last November, CBS Sports caused a tempest in a teapot with an article with the sensational headline “Patriots have no need for probability, win coin flip at impossible rate.” From the opening paragraphs:

Bill Belichick is never unprepared. Or at least that’s the perception. When other coaches struggle with when to use timeouts or how to manage the clock, the Patriots coach, almost effortlessly, always seems to make the right decision.

Belichick has also been extremely lucky. The Pats have won the coin toss 19 of the last 25 times, according to the Boston Globe‘s Jim McBride.

For some perspective: Assuming the coin toss is a 50/50 proposition, the probability of winning it at least 19 times in 25 tries is 0.0073. That’s less than three-quarters of one percent.

As far as the math goes, the calculation is correct. Using the binomial distribution,

\displaystyle \sum_{n=19}^{25} {25 \choose n} (0.5)^n (0.5)^{25-n} \approx 0.0073.

Unfortunately, this is far too simplistic an analysis to accuse someone of “winning the coin flip at an impossible rate.” Rather than re-do the calculations myself, I’ll just quote from the following article from the Harvard Sports Analysis Collective. The article begins by noting that while the Patriots may have been lucky the last 25 games, it’s not surprising that some team in the NFL was lucky (and the lucky team just happened to be the Patriots).

But how impossible is it? Really, we are interested in not only the probability of getting 19 or more heads but also a result as extreme in the other direction – i.e. 6 or fewer. That probability is just 2*0.0073, or 0.0146.

That is still very low, however given that there 32 teams in the NFL, the probability of any one team doing this is much higher. To do an easy calculation we can assume that all tosses are independent, which isn’t entirely true as when one team wins the coin flip the other team loses. The proper way to do this would be via simulation, but assuming independence is much easier and should yield pretty similar results. The probability of any one team having a result that extreme, as shown before, is 0.0146. The probability of a team NOT having a result that extreme is 1-0.0146 = 0.9854. The probability that, with 32 teams, there is not one of them with a result this extreme is 0.985432 = 0.6245998. Therefore, with 32 teams, we would expect at least one team to have a result as extreme as the Patriots have had over the past 25 games 1- 0.6245998 = 0.3754002, or 37.5% of the time. That is hardly significant. Even if you restricted it to not all results as extreme in either direction but just results of 19 or greater, the probability of one or more teams achieving that is still nearly 20%.

The article goes on to note the obvious cherry-picking used in selecting the data… in other words, picking the 25 consecutive games that would make the Patriots look like they were somehow cheating on the coin flip.

In addition the selection of looking at only the last 25 games is surely a selection made on purpose to make Belichick look bad. Why not look throughout his career? Did he suddenly discover a talent for predicting the future? Furthermore, given the length of Belichick’s career, we would almost expect him to go through a period where he wins 19 of 25 coin flips by random chance alone. We actually simulate this probability. Given that he has coached 247 games with the Patriots, we can randomly generate a string of zeroes and ones corresponding to lost and won con flips respectively. We can then check the string for a sequence of 25 games where there was 19 or more heads. I did this 10,000 times – in 38.71% of these simulations there was at least one sequence with 19 or more heads out of 25.

The author makes the following pithy conclusion:

To be fair, the author of this article did not seem to insinuate that the Patriots were cheating, rather he was just remarking that it was a rare event (although, in reality, it shouldn’t be as unexpected as he makes it out to be). The fault seems to rather lie with who made the headline and pubbed it, although their job is probably just to get pageviews in which case I guess they succeeded.

At any rate, the Patriots lost the coin flip in the 26th game.

In-class demo: The binomial distribution and the bell curve

Many years ago, the only available in-class technology at my university was the Microsoft Office suite — probably Office 95 or 98. This placed severe restrictions on what I could demonstrate in my statistics class, especially when I wanted to have an interactive demonstration of how the binomial distribution gets closer and closer to the bell curve as the number of trials increases (as long as both np and n(1-p) are also decently large).

The spreadsheet in the link below is what I developed. It shows

  • The probability histogram of the binomial distribution for n \le 150
  • The bell curve with mean \mu = np and standard deviation \sigma = \sqrt{np(1-p)}
  • Also, the minimum and maximum values on the x-axis can be adjusted. For example, if n = 100 and p = 0.01, it doesn’t make much sense to show the full histogram; it suffices to have a maximum value around 5 or so.

In class, I take about 3-5 minutes to demonstrate the following ideas with the spreadsheet:

  • If n is large and both np and n(1-p) are greater than 10, then the normal curve provides a decent approximation to the binomial distribution.
  • The probability distribution provides exact answers to probability questions, while the normal curve provides approximate answers.
  • If n is small, then the normal approximation isn’t very good.
  • If n is large but p is small, then the normal approximation isn’t very good. I’ll say in words that there is a decent approximation under this limit, namely the Poisson distribution, but (for a class in statistics) I won’t say much more than that.

Doubtlessly, there are equally good pedagogical tools for this purpose. However, at the time I was limited to Microsoft products, and it took me untold hours to figure out how to get Excel to draw the probability histogram. So I continue to use this spreadsheet in my classes to demonstrate to students this application of the Central Limit Theorem.

Excel spreadhseet: binomial.xlsx

A surprising appearance of e

Here’s a simple probability problem that should be accessible to high school students who have learned the Multiplication Rule:

Suppose that you play the lottery every day for about 20 years. Each time you play, the chance that you win is 1 chance in 1000. What is the probability that, after playing  1000 times, you never win?

This is a straightforward application of the Multiplication Rule from probability. The chance of not winning on any one play is 0.999. Therefore, the chance of not winning 1000 consecutive times is (0.999)^{1000}, which we can approximate with a calculator.


Well, that was easy enough. Now, just for the fun of it, let’s find the reciprocal of this answer.


Hmmm. Two point seven one. Where have I seen that before? Hmmm… Nah, it couldn’t be that.

What if we changed the number 1000 in the above problem to 1,000,000? Then the probability would be (0.999999)^{1000000}.


There’s no denying it now… it looks like the reciprocal is approximately e, so that the probability of never winning for both problems is approximately 1/e.

Why is this happening? I offer a thought bubble if you’d like to think about this before proceeding to the answer.

green_speech_bubbleThe above calculations are numerical examples that demonstrate the limit

\displaystyle \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n = e^x

In particular, for the special case when n = -1, we find

\displaystyle \lim_{n \to \infty} \left(1 - \frac{1}{n}\right)^n = e^{-1} = \displaystyle \frac{1}{e}

The first limit can be proved using L’Hopital’s Rule. By continuity of the function f(x) = \ln x, we have

\ln \left[ \displaystyle \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n \right] = \displaystyle \lim_{n \to \infty} \ln \left[ \left(1 + \frac{x}{n}\right)^n \right]

\ln \left[ \displaystyle \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n \right] = \displaystyle \lim_{n \to \infty} n \ln \left(1 + \frac{x}{n}\right)

\ln \left[ \displaystyle \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n \right] = \displaystyle \lim_{n \to \infty} \frac{ \displaystyle \ln \left(1 + \frac{x}{n}\right)}{\displaystyle \frac{1}{n}}

The right-hand side has the form \infty/\infty as n \to \infty, and so we may use L’Hopital’s rule, differentiating both the numerator and the denominator with respect to n.

\ln \left[ \displaystyle \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n \right] = \displaystyle \lim_{n \to \infty} \frac{ \displaystyle \frac{1}{1 + \frac{x}{n}} \cdot \frac{-x}{n^2} }{\displaystyle \frac{-1}{n^2}}

\ln \left[ \displaystyle \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n \right] = \displaystyle \lim_{n \to \infty} \displaystyle \frac{x}{1 + \frac{x}{n}}

\ln \left[ \displaystyle \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n \right] = \displaystyle \frac{x}{1 + 0}

\ln \left[ \displaystyle \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n \right] = x

Applying the exponential function to both sides, we conclude that

\displaystyle \lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n= e^x

green lineIn an undergraduate probability class, the problem can be viewed as a special case of a Poisson distribution approximating a binomial distribution if there’s a large number of trials and a small probability of success.

The above calculation also justifies (in Algebra II and Precalculus) how the formula for continuous compound interest A = Pe^{rt} can be derived from the formula for discrete compound interest A = P \displaystyle \left( 1 + \frac{r}{n} \right)^{nt}

All this to say, Euler knew what he was doing when he decided that e was so important that it deserved to be named.