2014 MacArthur “Genius” grant winner and the twin prime conjecture

From http://www.macfound.org/fellows/927/:

Prime numbers have inspired great intrigue over the last centuries, and one of the most basic unanswered questions has been the spacing between two consecutive prime numbers, or the twin prime conjecture, which states that there are infinitely many pairs of primes that differ by two. Despite many efforts at proving this conjecture, mathematicians were not able to rule out the possibility that the gaps between primes continue to expand, eventually exceeding any particular bound. Zhang’s work shows that there are infinitely many consecutive primes, or pairs of primes, closer than 70 million. In other words, as you go to larger and larger numbers, the primes will not become further and further apart—you will keep finding prime pairs that differ by less than 70 million.

His work has generated significant collaborations across the community to expand on his effort, and within months of his discovery that number was reduced from 70 million to less than 5,000.

Nuts and Bolts of Political Polls

A standard topic in my statistics class is political polling, which is the canonical example of constructing a confidence interval with a relatively small sample to (hopefully) project the opinions of a large population. Of course, polling is only valid if the sample represents the population at large. This is a natural engagement activity in the fall semester preceding a presidential election.

A recent article on FiveThirtyEight.com, “Are Bad Pollsters Copying Good Pollsters,” does a nice job of explaining some of the nuts and bolts of political polling in an age when selected participants are increasingly unlikely to participate… and also raises the specter of how pollsters using nontraditional methods might consciously or subconconsciously cheating. A sample (pun intended) from the article:

What’s a nontraditional poll? One that doesn’t abide by the industry’s best practices. So, a survey is nontraditional if it:

doesn’t follow probability sampling;

doesn’t use live interviewers;

is released by a campaign or campaign groups (because these only selectively release data);

doesn’t disclose (i.e. doesn’t release raw data to the Roper Archives, isn’t a member of the National Council on Public Polls, or hasn’t signed onto the American Association for Public Opinion Research transparency initiative).

Everything else is a gold-standard poll…

Princeton University graduate student Steven Rogers and Vanderbilt University professor of political science Joshua Clinton [studied] interactive voice response (IVR) surveys in the 2012 Republican presidential primary. (IVR pollsters are in our nontraditional group.) Rogers and Clinton found that IVR pollsters were about as accurate as live-interview pollsters in races where live-interview pollsters surveyed the electorate. IVR pollsters were considerably less accurate when no live-interview poll was conducted. This effect held true even when controlling for a slew of different variables. Rogers and Clinton suggested that the IVR pollsters were taking a “cue” from the live pollsters in order to appear more accurate.

My own analysis hints at the same possibility. The nontraditional pollsters did worse in races without a live pollster.

A probability problem involving two cards: Index

I’m doing something that I should have done a long time ago: collect past series of posts into a single, easy-to-reference post. The following posts formed my series on different ways (correct and incorrect) to solve a two-part probability problem.

Part 1: Two different and correct ways of solving the following problem: “Two cards are dealt from a well-shuffled deck. Find the probability that the first is an ace or the second is a ace.”

Part 2: Two different ways — one correct, one incorrect — of solving the following problem: “Two cards are dealt from a well-shuffled deck. Find the probability that the first is an ace or the second is a jack.”

Part 3: Explaining the incorrect solution, and salvaging the solution to obtain the correct answer.

A probability problem involving two cards (Part 3)

In yesterday’s post, I gave two solutions that a student gave to the following probability problem. One of the solutions was correct, and one of the solutions was incorrect.

Two cards are dealt from a well-shuffled deck. Find the probability that the first is an ace or the second is a jack.

What follows is the incorrect (but, as we’ll see later, salvageable) solution.

Method #2 (incorrect):There are three ways that either the first or second card could be an a jack:

The first card is an ace and the second card is a jack.
The first card is an ace and the second card is not a jack.
The first card is not an ace and the second card is a jack.

Each of these can be computed using the rule $P(A \cap B) = P(A) P(B \mid A)$ in much the same way as above:

P(first an ace) P(second a jack, given first an ace) $= \displaystyle \frac{4}{52} \cdot \frac{4}{51}$
P(first an ace) P(second not a jack, given first an ace) $= \displaystyle \frac{4}{52} \cdot \frac{47}{51}$
P(first not an ace) P(second a jack, given first not an ace) $= \displaystyle \frac{48}{52} \cdot \frac{4}{51}$

Adding these together, we obtain the answer:

$\displaystyle \frac{4}{52} \cdot \frac{4}{51} + \frac{4}{52} \cdot \frac{47}{51} + \frac{48}{52} \cdot \frac{4}{51}$

$= \displaystyle \frac{4 \times 4 + 4 \times 47 + 48 \times 4}{52 \times 51}$

$= \displaystyle \frac{4 + 47 + 48}{13 \times 51}$

$= \displaystyle \frac{99}{663}$

When my student presented this to me, I must admit that it took me a couple of minutes before I found the hole in the student’s logic.

This answer is wrong because the second probability in Case 3 above was not calculated correctly. If we only know that the first card is not an ace, then we don’t have enough information to know how many of the remaining 51 cards are jacks. So the conditional probability $\displaystyle \frac{4}{51}$ is incorrect in Step 3.

Even though the above logic is slightly incorrect, it can be salvaged by splitting Case 3 above into two subcases.

Method #2:There are three ways that either the first or second card could be an a jack:

The first card is an ace and the second card is a jack.
The first card is an ace and the second card is not a jack.
The first card is a jack, and the second card is a jack.
The first card is neither an ace nor a jack, and the second card is a jack.

Each of these can be computed using the rule $P(A \cap B) = P(A) P(B \mid A)$ :

P(first an ace) P(second a jack, given first an ace) $= \displaystyle \frac{4}{52} \cdot \frac{4}{51}$
P(first an ace) P(second not a jack, given first an ace) $= \displaystyle \frac{4}{52} \cdot \frac{47}{51}$
P(first a jack) P(second a jack, given first a jack) $= \displaystyle \frac{4}{52} \cdot \frac{3}{51}$
P(first neither an ace or a jack) P(second a jack, given first neither an ace or a jack) $= \displaystyle \frac{44}{52} \cdot \frac{4}{51}$

By splitting the original Case 3 into the new Cases 3 and 4, there is no longer any ambiguity about how many jacks remain when the second card is chosen. Adding these together, we obtain the answer:

$\displaystyle \frac{4}{52} \cdot \frac{4}{51} + \frac{4}{52} \cdot \frac{47}{51} + \frac{4}{52} \frac{3}{51} + \frac{44}{52} \cdot \frac{4}{51}$

$= \displaystyle \frac{4 \times 4 + 4 \times 47 + 4 \times 3 + 44 \times 4}{52 \times 51}$

$= \displaystyle \frac{4 + 47 + 3 + 44}{13 \times 51}$

$= \displaystyle \frac{98}{663}$

Not surprisingly, this matches the answer obtained when the formula $P(A \cup B) = P(A) + P(B) - P(A \cap B)$ was employed in yesterday’s post.

A probability problem involving two cards (Part 2)

Here is a standard problem that could appear in an elementary probability class.

Two cards are dealt from a well-shuffled deck. Find the probability that the first is an ace or the second is a jack.

In yesterday’s post, I considered two different ways of solving a similar-looking problem, except the final word jack was replaced by ace. Yesterday, I showed that there were two legitimate ways of solving that problem, resulting (of course) in the same answer.

About a year ago, a student came into my office using these two different techniques to solve the ace/jack problem. Except she arrived at two different answers!

Method #1: One law for probability states that

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$

Another law of probability states that

$P(A \cap B) = P(A) P(B \mid A)$

Combining these, we find that

$P(A \cup B) = P(A) + P(B) - P(A) P(B \mid A)$

Written more colloquially,

P(first an ace or second a jack)

= P(first an ace) + P(second a jack) – P(first an ace AND second a jack)

=P(first an ace) + P(second a jack) – P(first an ace) P(second a jack, given first an ace)

Let’s look at these three probabilities on the last line separately.

P(first an ace) is $\displaystyle \frac{4}{52}$ .
P(second a jack) is also $\displaystyle \frac{4}{52}$ . No information about the first card appears between the two parentheses, and so this is similar to pulling a card out of the middle of a deck. Since no information is given about the preceding card(s), the answer is still $\displaystyle \frac{4}{52}$ .
P(second an a jack, given first an ace) is different than the answer to #2 above. For this problem, the first card is known to be an ace, and the question is, given this knowledge, what is the probability that the second card is a jack? Since the first card is known to be an ace, there are still 4 jacks left out of 51 possible cards. Therefore, the answer is $\displaystyle \frac{4}{51}$ .

Putting these together, we find the final solution of

$\displaystyle \frac{4}{52} + \frac{4}{52} - \frac{4}{52} \cdot \frac{4}{51}$

$= \displaystyle \frac{1}{13} + \frac{1}{13} - \frac{1}{13} \cdot \frac{4}{51}$

$= \displaystyle \frac{51+51-4}{13 \times 51}$

$= \displaystyle \frac{98}{663}$

Here’s was the student’s second solution.

Method #2:There are three ways that either the first or second card could be an a jack:

The first card is an ace and the second card is a jack.
The first card is an ace and the second card is not a jack.
The first card is not an ace and the second card is a jack.

Each of these can be computed using the rule $P(A \cap B) = P(A) P(B \mid A)$ in much the same way as above:

P(first an ace) P(second a jack, given first an ace) $= \displaystyle \frac{4}{52} \cdot \frac{4}{51}$
P(first an ace) P(second not a jack, given first an ace) $= \displaystyle \frac{4}{52} \cdot \frac{47}{51}$
P(first not an ace) P(second a jack, given first not an ace) $= \displaystyle \frac{48}{52} \cdot \frac{4}{51}$

Adding these together, we obtain the answer:

$\displaystyle \frac{4}{52} \cdot \frac{4}{51} + \frac{4}{52} \cdot \frac{47}{51} + \frac{48}{52} \cdot \frac{4}{51}$

$= \displaystyle \frac{4 \times 4 + 4 \times 47 + 48 \times 4}{52 \times 51}$

$= \displaystyle \frac{4 + 47 + 48}{13 \times 51}$

$= \displaystyle \frac{99}{663}$

So, my student asked me, “Which one is the right answer? And why is the wrong answer wrong?” I must admit that it took me a couple of minutes before I found the student’s mistake.After all, the student’s logic perfectly paralleled the correct logic given in yesterday’s post.

I’ll discuss the mistake in tomorrow’s post. Until then, here’s a green thought cloud so that you also can think about what the student did wrong.

A probability problem involving two cards (Part 1)

Here is a standard problem that could appear in an elementary probability class.

Two cards are dealt from a well-shuffled deck. Find the probability that the first is an ace or the second is a ace.

There are at least two legitimate ways to solve this problem:

Method #1: One law for probability states that

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$

Another law of probability states that

$P(A \cap B) = P(A) P(B \mid A)$

Combining these, we find that

$P(A \cup B) = P(A) + P(B) - P(A) P(B \mid A)$

Written more colloquially,

P(first an ace or second an ace)

= P(first an ace) + P(second an ace) – P(first an ace AND second an ace)

=P(first an ace) + P(second an ace) – P(first an ace) P(second an ace, given first an ace)

Let’s look at these three probabilities on the last line separately.

P(first an ace) is $\displaystyle \frac{4}{52}$ .
P(second an ace) is also $\displaystyle \frac{4}{52}$ . No information about the first card appears between the two parentheses, and so this is similar to pulling a card out of the middle of a deck. Since no information is given about the preceding card(s), the answer is still $\displaystyle \frac{4}{52}$ .
P(second an ace, given first an ace) is different than the answer to #2 above. For this problem, the first card is known to be an ace, and the question is, given this knowledge, what is the probability that the second card is an ace? Since the first card is known to be an ace, there are only 3 aces left out of 51 possible cards. Therefore, the answer is $\displaystyle \frac{3}{51}$ .

Putting these together, we find the final solution of

$\displaystyle \frac{4}{52} + \frac{4}{52} - \frac{4}{52} \cdot \frac{3}{51}$

$= \displaystyle \frac{1}{13} + \frac{1}{13} - \frac{1}{13} \cdot \frac{1}{17}$

$= \displaystyle \frac{17+17-1}{13 \times 17}$

$= \displaystyle \frac{33}{221}$

Here’s a second legitimate solution, though it does take a little more work.

Method #2:There are three ways that either the first or second card could be an ace:

The first card is an ace and the second card is an ace.
The first card is an ace and the second card is not an ace.
The first card is not an ace and the second card is an ace.

Each of these can be computed using the rule $P(A \cap B) = P(A) P(B \mid A)$ in much the same way as above:

P(first an ace) P(second an ace, given first an ace) $= \displaystyle \frac{4}{52} \cdot \frac{3}{51}$
P(first an ace) P(second not an ace, given first an ace) $= \displaystyle \frac{4}{52} \cdot \frac{48}{51}$
P(first not an ace) P(second an ace, given first not an ace) $= \displaystyle \frac{48}{52} \cdot \frac{4}{51}$

Adding these together, we obtain the answer:

$\displaystyle \frac{4}{52} \cdot \frac{3}{51} + \frac{4}{52} \cdot \frac{48}{51} + \frac{48}{52} \cdot \frac{4}{51}$

$= \displaystyle \frac{4 \times 3 + 4 \times 48 + 48 \times 4}{52 \times 51}$

$= \displaystyle \frac{3 + 48 + 48}{13 \times 51}$

$= \displaystyle \frac{1 + 16 + 16}{13 \times 17}$

$\displaystyle \frac{33}{221}$

Not surprisingly, the two answers are the same.

In tomorrow’s post, I’ll describe the time that a student came to me with a similar-looking probability problem, but she obtained two different answers using these two different methods.

Drought in California

Source: http://www.xkcd.com/1410/

The truth about a really misleading graphic

Last month, Vox published an article that was quite critical of the ALS Ice Bucket challenge, pointing out that donations for curing prevalent diseases don’t always match the actual deaths caused by those diseases. The author included the following graphic to make her point:

The point of this post is not to debate personal or utilitarian motivations for charitable giving or to contest the main point of the author’s article.. Instead, I just want to take a focused, hard look at the above picture, which I argue is utterly misleading but has been circulated widely in social media and by reputable news organizations.

In this post, I’ll accept without argument the validity of the given numbers. For example, on the right hand side, there are about a quarter as many deaths in the United States due to Chronic Obstructive Pulmonary Disease (142,942) than Heart Disease (596,577). However, the light blue circle on the right looks microscopic compared to the purple circle. It should appear to be about one-fourth the size, but it doesn’t.

In any statistics class, we teach that in a properly drawn historgram, areas should represent relative frequencies. However, in the above picture, the numbers appear to be represented by the radii of the circles, not the areas. So the light blue circle has a radius about one-fourth of the big purple circle, and so the ratio of the areas is about one-sixteenth, not one-fourth.

Second, the area of the biggest circle on the left is not the same as the area of the biggest circle on the right, even the though the units of the two sets of circles (dollars and deaths) are not comparable. A much fairer comparison would draw the biggest circles to be the same size.

So, in my opinion, here’s a much fairer rendering of the same numbers. Notice that the difference in the areas of the purple circles (for heart disease) and the pink circles (for breast cancer) is not nearly as dramatic as in the picture below.

Classroom Voting Patterns in Differential Calculus

Every so often, I’ll publicize through this blog an interesting article that I’ve found in the mathematics or mathematics education literature that can be freely distributed to the general public. Today, I’d like to highlight Kelly Cline , Holly Zullo & Lahna VonEpps (2012) Classroom Voting Patterns in Differential Calculus, PRIMUS: Problems,Resources, and Issues in Mathematics Undergraduate Studies, 22:1, 43-59, DOI: 10.1080/10511970.2010.491521

Here’s the abstract:

We study how different sections voted on the same set of classroom voting questions in differential calculus, finding that voting patterns can be used to identify some of the questions that have the most pedagogic value. We use statistics to identify three types of especially useful questions: 1. To identify good discussion questions, we look for those that produce the greatest diversity of responses, indicating that several answers are regularly plausible to students. 2. We identify questions that consistently provoke a common misconception, causing a majority of students to vote for one particular incorrect answer. When this is revealed to the students, they are usually quite surprised that the majority is wrong, and they are very curious to learn what they missed, resulting in a powerfully teachable moment. 3. By looking for questions where the percentage of correct votes varies the most between classes, we can find checkpoint questions that provide effective formative assessment as to whether a class has mastered a particular concept.

The full article can be found here: http://dx.doi.org/10.1080/10511970.2010.491521

Exponential growth and decay: Index

I’m doing something that I should have done a long time ago: collect past series of posts into a single, easy-to-reference post. The following posts formed my recently completed series on various applications of exponential growth and decay.

Part 1: Introduction: continuous compound interest and the phrasing of homework questions

Paying off credit-card debt

Part 2: Solution using a differential equation.

Part 3: Teaching basic principles of financial literacy.

Part 4: More on financial literacy.

Part 5: Solution using a difference equation.

Part 6: Comparison of the two solutions (difference equation vs. differential equation).

Part 7: An alternative solution of the difference equation that can be derived by Precalculus students.

Part 8: Verifying the solution of the difference equation using a spreadsheet.

Part 9: Amortization tables.

Half-life

Part 10: Derivation of the formula for exponential decay using a differential equation.

Part 11: Rewriting the solution of the differential equation into the half-life formula.

Newton’s Law of Cooling

Part 12: Derivation of the formula using a differential equation.

Part 13: Classroom demonstrations of Newton’s Law of Cooling.

Logistic Growth Model

Part 14: A simple classroom demonstration of the logistic growth model.

Part 15: The governing differential equation for the logistic growth model.

Part 16: Tips on graphing the logistic growth function.