# Facebook Birthday Problem: Part 5

Recently, I devised the following problem:

Suppose that you have n friends, and you always say “Happy Birthday” to each friend on his/her birthday. On how many days of the year will you not say “Happy Birthday” to one of your friends?

Until somebody tells me otherwise, I’m calling this the Facebook birthday problem in honor of Facebook’s daily alerts to say “Happy Birthday” to friends.

Here’s how I solved this problem. Let $I_k$ be an indicator random variable for “no friend has a birthday on day $k$, where $k = 366$ stands for February 29 and $k = 1, \dots, 365$ stand for the “usual” 365 days of the year. Therefore, the quantity $N$, representing the number of days of the year on which no friend has a birthday, can be written as

$N = I_1 + \dots + I_{365} + I_{366}$

In yesterday’s post, I began the calculation of the standard deviation of $N$ by first computing its variance. This calculation is complicated by the fact that $I_1, \dots, I_{366}$ are dependent. Yesterday, I showed that

$\hbox{Var}(N) = 365\displaystyle \left( \frac{1457}{1461} \right)^n \left[ 1 - \left( \frac{1457}{1461} \right)^n \right] + \displaystyle \left( \frac{1460}{1461} \right)^n \left[ 1 - \left( \frac{1460}{1461} \right)^n \right]$

$+ \displaystyle (365)(364) \left[ \displaystyle \left( \frac{1453}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^{2n} \right] + 2 \sum_{k=1}^{365}\hbox{Cov}(I_k,I_{366})$.

To complete this calculation, I’ll now find $\hbox{Cov}(I_k,I_{366})$, where $1 \le k \le 365$. I’ll use the usual computation formula for a covariance,

$\hbox{Cov}(I_k,I_{366}) = E(I_k I_{366}) - E(I_k) E(I_{366})$.

We have calculated $E(I_k)$ earlier in this series. In any four-year span, there are $4 \times 365 + 1 = 1461$ days, of which only one is February 29. Assuming the birthday’s are evenly distributed (which actually doesn’t happen in real life), the chance that someone’s birthday is not on day $k$ is

$\displaystyle 1 - \frac{4}{1461} = \displaystyle \frac{1457}{1461}$,

so that the probability that no friend has a birthday on day $k$ is

$\displaystyle \left( \frac{1457}{1461} \right)^n$.

Therefore, since the expected value of an indicator random variable is the probability that the event happens, we have

$E(I_k) = \displaystyle \left( \frac{1457}{1461} \right)^n$

for $k = 1, \dots, 365$. Similarly,

$E(I_{366}) = \displaystyle \left( \frac{1460}{1461} \right)^n$,

so that

$\hbox{Cov}(I_k,I_{366}) = E(I_k I_{366}) - \displaystyle \left( \frac{1457}{1461} \right)^n \left( \frac{1460}{1461} \right)^n$.

To find $E(I_k I_{366})$, we note that since $I_k$ is equal to either 0 or 1 and $I_{366}$ is equal to either 0 or 1, the product $I_k I_{366}$ can only equal 0 and 1 as well. Therefore, $I_k I_{366}$ is itself an indicator random variable. Furthermore, $I_k I_{366} = 1$ if and only if $I_k = 1$ and $I_{366} = 1$, which means that no friends has a birthday on either day $k$ or day $366$ (that is, February 29). The chance that someone doesn’t have a birthday on day $k$ or February 29 is

$\displaystyle 1 - \frac{4}{1461} - \frac{1}{1461} = \displaystyle \frac{1456}{1461}$,

so that the probability that no friend has a birthday on day $k$ or February 29 is

$\displaystyle \left( \frac{1456}{1461} \right)^n$.

Therefore, as before,

$E(I_k I_{366}) = \displaystyle \left( \frac{1456}{1461} \right)^n$,

so that

$\hbox{Cov}(I_k,I_{366}) = \displaystyle \left( \frac{1456}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^n \left( \frac{1460}{1461} \right)^n$.

Therefore,

$\hbox{Var}(N) = 365\displaystyle \left( \frac{1457}{1461} \right)^n \left[ 1 - \left( \frac{1457}{1461} \right)^n \right] + \displaystyle \left( \frac{1460}{1461} \right)^n \left[ 1 - \left( \frac{1460}{1461} \right)^n \right]$

$+ \displaystyle (365)(364) \left[ \displaystyle \left( \frac{1453}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^{2n} \right] + 2(365) \left[ \left( \frac{1456}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^n \left( \frac{1460}{1461} \right)^n \right]$,

and we find the standard deviation of $N$ using

$\hbox{SD}(N) = \sqrt{\hbox{Var}(N)}$.

The graph below shows the expected value of $N$, which was shown earlier to be

$E(N) = 365 \displaystyle \left( \frac{1457}{1461} \right)^n + \left( \frac{1460}{1461} \right)^n$,

along with error bars representing two standard deviations.

Interestingly, the standard deviation of $N$ changes for different values of $n$; a direct calculation shows that the $\hbox{SD}(N)$ is maximized at $n = 459$ with maximum value of approximately $6.1$. Accordingly, for $n = 450$ and $n = 500$, the error bars in the above figure have a total width of approximately 24 days (two standard deviations both above and below the expected value).