# Facebook Birthday Problem: Part 5

Recently, I devised the following problem:

Suppose that you have n friends, and you always say “Happy Birthday” to each friend on his/her birthday. On how many days of the year will you not say “Happy Birthday” to one of your friends?

Until somebody tells me otherwise, I’m calling this the Facebook birthday problem in honor of Facebook’s daily alerts to say “Happy Birthday” to friends.

Here’s how I solved this problem. Let $I_k$ be an indicator random variable for “no friend has a birthday on day $k$, where $k = 366$ stands for February 29 and $k = 1, \dots, 365$ stand for the “usual” 365 days of the year. Therefore, the quantity $N$, representing the number of days of the year on which no friend has a birthday, can be written as

$N = I_1 + \dots + I_{365} + I_{366}$

In yesterday’s post, I began the calculation of the standard deviation of $N$ by first computing its variance. This calculation is complicated by the fact that $I_1, \dots, I_{366}$ are dependent. Yesterday, I showed that

$\hbox{Var}(N) = 365\displaystyle \left( \frac{1457}{1461} \right)^n \left[ 1 - \left( \frac{1457}{1461} \right)^n \right] + \displaystyle \left( \frac{1460}{1461} \right)^n \left[ 1 - \left( \frac{1460}{1461} \right)^n \right]$

$+ \displaystyle (365)(364) \left[ \displaystyle \left( \frac{1453}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^{2n} \right] + 2 \sum_{k=1}^{365}\hbox{Cov}(I_k,I_{366})$.

To complete this calculation, I’ll now find $\hbox{Cov}(I_k,I_{366})$, where $1 \le k \le 365$. I’ll use the usual computation formula for a covariance,

$\hbox{Cov}(I_k,I_{366}) = E(I_k I_{366}) - E(I_k) E(I_{366})$.

We have calculated $E(I_k)$ earlier in this series. In any four-year span, there are $4 \times 365 + 1 = 1461$ days, of which only one is February 29. Assuming the birthday’s are evenly distributed (which actually doesn’t happen in real life), the chance that someone’s birthday is not on day $k$ is

$\displaystyle 1 - \frac{4}{1461} = \displaystyle \frac{1457}{1461}$,

so that the probability that no friend has a birthday on day $k$ is

$\displaystyle \left( \frac{1457}{1461} \right)^n$.

Therefore, since the expected value of an indicator random variable is the probability that the event happens, we have

$E(I_k) = \displaystyle \left( \frac{1457}{1461} \right)^n$

for $k = 1, \dots, 365$. Similarly,

$E(I_{366}) = \displaystyle \left( \frac{1460}{1461} \right)^n$,

so that

$\hbox{Cov}(I_k,I_{366}) = E(I_k I_{366}) - \displaystyle \left( \frac{1457}{1461} \right)^n \left( \frac{1460}{1461} \right)^n$.

To find $E(I_k I_{366})$, we note that since $I_k$ is equal to either 0 or 1 and $I_{366}$ is equal to either 0 or 1, the product $I_k I_{366}$ can only equal 0 and 1 as well. Therefore, $I_k I_{366}$ is itself an indicator random variable. Furthermore, $I_k I_{366} = 1$ if and only if $I_k = 1$ and $I_{366} = 1$, which means that no friends has a birthday on either day $k$ or day $366$ (that is, February 29). The chance that someone doesn’t have a birthday on day $k$ or February 29 is

$\displaystyle 1 - \frac{4}{1461} - \frac{1}{1461} = \displaystyle \frac{1456}{1461}$,

so that the probability that no friend has a birthday on day $k$ or February 29 is

$\displaystyle \left( \frac{1456}{1461} \right)^n$.

Therefore, as before,

$E(I_k I_{366}) = \displaystyle \left( \frac{1456}{1461} \right)^n$,

so that

$\hbox{Cov}(I_k,I_{366}) = \displaystyle \left( \frac{1456}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^n \left( \frac{1460}{1461} \right)^n$.

Therefore,

$\hbox{Var}(N) = 365\displaystyle \left( \frac{1457}{1461} \right)^n \left[ 1 - \left( \frac{1457}{1461} \right)^n \right] + \displaystyle \left( \frac{1460}{1461} \right)^n \left[ 1 - \left( \frac{1460}{1461} \right)^n \right]$

$+ \displaystyle (365)(364) \left[ \displaystyle \left( \frac{1453}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^{2n} \right] + 2(365) \left[ \left( \frac{1456}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^n \left( \frac{1460}{1461} \right)^n \right]$,

and we find the standard deviation of $N$ using

$\hbox{SD}(N) = \sqrt{\hbox{Var}(N)}$.

The graph below shows the expected value of $N$, which was shown earlier to be

$E(N) = 365 \displaystyle \left( \frac{1457}{1461} \right)^n + \left( \frac{1460}{1461} \right)^n$,

along with error bars representing two standard deviations.

Interestingly, the standard deviation of $N$ changes for different values of $n$; a direct calculation shows that the $\hbox{SD}(N)$ is maximized at $n = 459$ with maximum value of approximately $6.1$. Accordingly, for $n = 450$ and $n = 500$, the error bars in the above figure have a total width of approximately 24 days (two standard deviations both above and below the expected value).

# Facebook Birthday Problem: Part 4

Recently, I devised the following problem:

Suppose that you have n friends, and you always say “Happy Birthday” to each friend on his/her birthday. On how many days of the year will you not say “Happy Birthday” to one of your friends?

Until somebody tells me otherwise, I’m calling this the Facebook birthday problem in honor of Facebook’s daily alerts to say “Happy Birthday” to friends.

Here’s how I solved this problem. Let $I_k$ be an indicator random variable for “no friend has a birthday on day $k$, where $k = 366$ stands for February 29 and $k = 1, \dots, 365$ stand for the “usual” 365 days of the year. Therefore, the quantity $N$, representing the number of days of the year on which no friend has a birthday, can be written as

$N = I_1 + \dots + I_{365} + I_{366}$

In yesterday’s post, I began the calculation of the standard deviation of $N$ by first computing its variance. This calculation is complicated by the fact that $I_1, \dots, I_{366}$ are dependent. Yesterday, I showed that

$\hbox{Var}(N) = 365\displaystyle \left( \frac{1457}{1461} \right)^n \left[ 1 - \left( \frac{1457}{1461} \right)^n \right] + \displaystyle \left( \frac{1460}{1461} \right)^n \left[ 1 - \left( \frac{1460}{1461} \right)^n \right]$

$+ \displaystyle 2 \!\!\!\!\! \sum_{1 \le j < k \le 365} \!\!\!\!\! \hbox{Cov}(I_j,I_k) + 2 \sum_{k=1}^{365} \hbox{Cov}(I_k,I_{366})$

To complete this calculation, I’ll now find the covariances. I’ll begin with $\hbox{Cov}(I_j,I_k)$ if $1 \le j < k \le 365$; that is, if $j$ and $k$ are days other than February 29. I’ll use the usual computation formula for a covariance,

$\hbox{Cov}(I_j,I_k) = E(I_j I_k) - E(I_j) E(I_k)$.

We have calculated $E(I_k)$ earlier in this series. In any four-year span, there are $4 \times 365 + 1 = 1461$ days, of which only one is February 29. Assuming the birthday’s are evenly distributed (which actually doesn’t happen in real life), the chance that someone’s birthday is not on day $k$ is

$\displaystyle 1 - \frac{4}{1461} = \displaystyle \frac{1457}{1461}$,

so that the probability that no friend has a birthday on day $k$ is

$\displaystyle \left( \frac{1457}{1461} \right)^n$.

Therefore, since the expected value of an indicator random variable is the probability that the event happens, we have

$E(I_k) = \displaystyle \left( \frac{1457}{1461} \right)^n$

for $k = 1, \dots, 365$. Therefore,

$\hbox{Cov}(I_j,I_k) = E(I_j I_k) - \displaystyle \left( \frac{1457}{1461} \right)^n \left( \frac{1457}{1461} \right)^n = E(I_j I_k) - \displaystyle \left( \frac{1457}{1461} \right)^{2n}$.

To find $E(I_j I_k)$, we note that since $I_j$ is equal to either 0 or 1 and $I_k$ is equal to either 0 or 1, the product $I_j I_k$ can only equal 0 and 1 as well. Therefore, $I_j I_k$ is itself an indicator random variable, which I’ll call $I_{jk}$. Furthermore, $I_{jk}$ if and only if $I_j = 1$ and $I_k = 1$, which means that no friends has a birthday on either day $j$ or day $k$. The chance that someone doesn’t have a birthday on day $j$ or day $k$ is

$\displaystyle 1 - \frac{4}{1461} - \frac{4}{1461} = \displaystyle \frac{1453}{1461}$,

so that the probability that no friend has a birthday on day $j$ or $k$ is

$\displaystyle \left( \frac{1453}{1461} \right)^n$.

Therefore, as before,

$E(I_j I_k) = \displaystyle \left( \frac{1453}{1461} \right)^n$,

so that

$\hbox{Cov}(I_j,I_k) = \displaystyle \left( \frac{1453}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^{2n}$.

Since there are $\displaystyle {365 \choose 2} = \displaystyle \frac{365\times 364}{2}$ pairs $(j,k)$ so that $1 \le j < k \le 365$, we have

$\hbox{Var}(N) = 365\displaystyle \left( \frac{1457}{1461} \right)^n \left[ 1 - \left( \frac{1457}{1461} \right)^n \right] + \displaystyle \left( \frac{1460}{1461} \right)^n \left[ 1 - \left( \frac{1460}{1461} \right)^n \right]$

$+ \displaystyle 2 \times \displaystyle \frac{365\times 364}{2} \left[ \displaystyle \left( \frac{1453}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^{2n} \right] + 2 \sum_{k=1}^{365}\hbox{Cov}(I_k,I_{366})$,

or

$\hbox{Var}(N) = 365\displaystyle \left( \frac{1457}{1461} \right)^n \left[ 1 - \left( \frac{1457}{1461} \right)^n \right] + \displaystyle \left( \frac{1460}{1461} \right)^n \left[ 1 - \left( \frac{1460}{1461} \right)^n \right]$

$+ \displaystyle (365)(364) \left[ \displaystyle \left( \frac{1453}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^{2n} \right] + 2 \sum_{k=1}^{365}\hbox{Cov}(I_k,I_{366})$.

The calculation of $\hbox{Cov}(I_k,I_{366})$ is similar to the above calculation; I’ll write this up in tomorrow’s post.

# Facebook Birthday Problem: Part 3

Recently, I devised the following problem:

Suppose that you have n friends, and you always say “Happy Birthday” to each friend on his/her birthday. On how many days of the year will you not say “Happy Birthday” to one of your friends?

Until somebody tells me otherwise, I’m calling this the Facebook birthday problem in honor of Facebook’s daily alerts to say “Happy Birthday” to friends.

Here’s how I solved this problem. Let $I_k$ be an indicator random variable for “no friend has a birthday on day $k$, where $k = 366$ stands for February 29 and $k = 1, \dots, 365$ stand for the “usual” 365 days of the year. Therefore, the quantity $N$, representing the number of days of the year on which no friend has a birthday, can be written as

$N = I_1 + \dots + I_{365} + I_{366}$

In yesterday’s post, I showed that

$E(N) = E(I_1) + \dots + E(I_{365}) + E(I_{366}) = 365 \displaystyle \left( \frac{1457}{1461} \right)^n + \left( \frac{1460}{1461} \right)^n$.

The calculation of the standard deviation of $N$ is considerably more complicated, however, since the $I_1, \dots, I_{366}$ are dependent. So we will begin by computing the variance of $N$:

$\hbox{Var}(N) = \displaystyle \sum_{k=1}^{366} \hbox{Var}(I_k) + 2 \!\!\!\!\! \sum_{1 \le j < k \le 366} \!\!\!\!\! \hbox{Cov}(I_j,I_k)$,

or

$\hbox{Var}(N) = \displaystyle \sum_{k=1}^{365} \hbox{Var}(I_k) + \hbox{Var}(I_{366}) + 2 \!\!\!\!\! \sum_{1 \le j < k \le 365} \!\!\!\!\! \hbox{Cov}(I_j,I_k) + 2 \sum_{k=1}^{365} \hbox{Cov}(I_k,I_{366})$

For the first term, we recognize that, in any four-year span, there are $4 \times 365 + 1 = 1461$ days, of which only one is February 29. Assuming the birthday’s are evenly distributed (which actually doesn’t happen in real life), the chance that someone’s birthday is not on day $k$ is

$\displaystyle 1 - \frac{4}{1461} = \displaystyle \frac{1457}{1461}$.

Therefore, the chance that all $n$ friends don’t have a birthday on day $k$ is

$\displaystyle \left( \frac{1457}{1461} \right)^n$.

Using the formula $\hbox{Var}(I) = p(1-p)$ for the variance of an indicator random variable, we see that

$\hbox{Var}(I_k) = \displaystyle \left( \frac{1457}{1461} \right)^n \left[ 1 - \left( \frac{1457}{1461} \right)^n \right]$

for $k = 1, \dots, 365$. Similarly, for the second term,

$\hbox{Var}(I_{366}) = \displaystyle \left( \frac{1460}{1461} \right)^n \left[ 1 - \left( \frac{1460}{1461} \right)^n \right]$

Therefore, so far we have shown that

$\hbox{Var}(N) = 365\displaystyle \left( \frac{1457}{1461} \right)^n \left[ 1 - \left( \frac{1457}{1461} \right)^n \right] + \displaystyle \left( \frac{1460}{1461} \right)^n \left[ 1 - \left( \frac{1460}{1461} \right)^n \right]$

$+ \displaystyle 2 \!\!\!\!\! \sum_{1 \le j < k \le 365} \!\!\!\!\! \hbox{Cov}(I_j,I_k) + 2 \sum_{k=1}^{365} \hbox{Cov}(I_k,I_{366})$

In tomorrow’s post, I’ll complete this calculation by finding the covariances.

# Facebook Birthday Problem: Part 2

Recently, I devised the following problem:

Suppose that you have n friends, and you always say “Happy Birthday” to each friend on his/her birthday. On how many days of the year will you not say “Happy Birthday” to one of your friends?

Until somebody tells me otherwise, I’m calling this the Facebook birthday problem in honor of Facebook’s daily alerts to say “Happy Birthday” to friends.

Here’s how I solved this problem. Let $I_k$ be an indicator random variable for “no friend has a birthday on day $k$, where $k = 366$ stands for February 29 and $k = 1, \dots, 365$ stand for the “usual” 365 days of the year. Therefore, the quantity $N$, representing the number of days of the year on which no friend has a birthday, can be written as

$N = I_1 + \dots + I_{365} + I_{366}$

Let’s start with any of the “usual” days. In any four-year span, there are $4 \times 365 + 1 = 1461$ days, of which only one is February 29. Assuming the birthday’s are evenly distributed (which actually doesn’t happen in real life), the chance that someone’s birthday is not on day $k$ is

$\displaystyle 1 - \frac{4}{1461} = \displaystyle \frac{1457}{1461}$.

Therefore, the chance that all $n$ friends don’t have a birthday on day $k$ is

$\displaystyle \left( \frac{1457}{1461} \right)^n$.

Since the expected value of an indicator random variable is the probability of the event, we see that

$E(I_k) = \displaystyle \left( \frac{1457}{1461} \right)^n$

for $k = 1, \dots, 365$. Similarly, the expected value for the indicator for February 29 is

$E(I_{366}) = \displaystyle \left( \frac{1460}{1461} \right)^n$.

Since $E(X+Y) = E(X) + E(Y)$ even if $X$ and $Y$ are dependent, we therefore conclude that

$E(N) = E(I_1) + \dots + E(I_{365}) + E(I_{366}) = 365 \displaystyle \left( \frac{1457}{1461} \right)^n + \left( \frac{1460}{1461} \right)^n$.

This function is represented by the red dots on the graph below.

In tomorrow’s post, I’ll calculate of the standard deviation of $N$.

# Facebook Birthday Problem: Part 1

The “birthday problem” is one of the classic problems in elementary probability because of its counter-intuitive solution. From Wikipedia:

In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 367 (since there are only 366 possible birthdays, including February 29). However, 99.9% probability is reached with just 70 people, and 50% probability with 23 people. These conclusions are based on the assumption that each day of the year (except February 29) is equally probable for a birthday.

Recently, I devised the following different birthday problem:

Suppose that you have n friends, and you always say “Happy Birthday” to each friend on his/her birthday. On how many days of the year will you not say “Happy Birthday” to one of your friends?

Until somebody tells me otherwise, I’m calling this the Facebook birthday problem in honor of Facebook’s daily alerts to say “Happy Birthday” to friends.

In this series, I will solve this problem. While this may ruin the suspense, here’s a graph of the solution for $100 \le n \le 1000$ along with error bars indicating two standard deviations.

Before deriving this solution, I’ll start with a thought bubble if you’d like to take some time to think about how to do this.

# What is a Random Walk?

From PBS Infinite Series:

# A Long-Sought Proof, Found and Almost Lost

I enjoyed this article from Quanta Magazine, both for its mathematical content as well as the human interest story.

# A Long-Sought Proof, Found and Almost Lost

From the opening paragraphs:

Known as the Gaussian correlation inequality (GCI), the conjecture originated in the 1950s, was posed in its most elegant form in 1972 and has held mathematicians in its thrall ever since. “I know of people who worked on it for 40 years,” said Donald Richards, a statistician at Pennsylvania State University. “I myself worked on it for 30 years.”

[Thomas] Royen hadn’t given the Gaussian correlation inequality much thought before the “raw idea” for how to prove it came to him over the bathroom sink… In July 2014, still at work on his formulas as a 67-year-old retiree, Royen found that the GCI could be extended into a statement about statistical distributions he had long specialized in. On the morning of the 17th, he saw how to calculate a key derivative for this extended GCI that unlocked the proof. “The evening of this day, my first draft of the proof was written,” he said.

Not knowing LaTeX, the word processer of choice in mathematics, he typed up his calculations in Microsoft Word, and the following month he posted his paper to the academic preprint site arxiv.org. He also sent it to Richards, who had briefly circulated his own failed attempt at a proof of the GCI a year and a half earlier. “I got this article by email from him,” Richards said. “And when I looked at it I knew instantly that it was solved” …

Proofs of obscure provenance are sometimes overlooked at first, but usually not for long: A major paper like Royen’s would normally get submitted and published somewhere like the Annals of Statistics, experts said, and then everybody would hear about it. But Royen, not having a career to advance, chose to skip the slow and often demanding peer-review process typical of top journals. He opted instead for quick publication in the Far East Journal of Theoretical Statistics, a periodical based in Allahabad, India, that was largely unknown to experts and which, on its website, rather suspiciously listed Royen as an editor. (He had agreed to join the editorial board the year before.)

With this red flag emblazoned on it, the proof continued to be ignored… No one is quite sure how, in the 21st century, news of Royen’s proof managed to travel so slowly. “It was clearly a lack of communication in an age where it’s very easy to communicate,” Klartag said.

# Happy Independence Day

From last year’s Math With Bad Drawings (I recommend reading the whole article):