Facebook Birthday Problem: Part 5

Recently, I devised the following problem:

Suppose that you have n friends, and you always say “Happy Birthday” to each friend on his/her birthday. On how many days of the year will you not say “Happy Birthday” to one of your friends?

Until somebody tells me otherwise, I’m calling this the Facebook birthday problem in honor of Facebook’s daily alerts to say “Happy Birthday” to friends.

Here’s how I solved this problem. Let I_k be an indicator random variable for “no friend has a birthday on day k, where k = 366 stands for February 29 and k = 1, \dots, 365 stand for the “usual” 365 days of the year. Therefore, the quantity N, representing the number of days of the year on which no friend has a birthday, can be written as

N = I_1 + \dots + I_{365} + I_{366}

In yesterday’s post, I began the calculation of the standard deviation of N by first computing its variance. This calculation is complicated by the fact that I_1, \dots, I_{366} are dependent. Yesterday, I showed that

\hbox{Var}(N) = 365\displaystyle \left( \frac{1457}{1461} \right)^n \left[ 1 - \left( \frac{1457}{1461} \right)^n \right] + \displaystyle \left( \frac{1460}{1461} \right)^n \left[ 1 - \left( \frac{1460}{1461} \right)^n \right]

+ \displaystyle (365)(364) \left[ \displaystyle \left( \frac{1453}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^{2n} \right] + 2 \sum_{k=1}^{365}\hbox{Cov}(I_k,I_{366}).

To complete this calculation, I’ll now find \hbox{Cov}(I_k,I_{366}), where 1 \le k \le 365. I’ll use the usual computation formula for a covariance,

\hbox{Cov}(I_k,I_{366}) = E(I_k I_{366}) - E(I_k) E(I_{366}).

We have calculated E(I_k) earlier in this series. In any four-year span, there are 4 \times 365 + 1 = 1461 days, of which only one is February 29. Assuming the birthday’s are evenly distributed (which actually doesn’t happen in real life), the chance that someone’s birthday is not on day k is

\displaystyle 1 - \frac{4}{1461} = \displaystyle \frac{1457}{1461},

so that the probability that no friend has a birthday on day k is

\displaystyle \left( \frac{1457}{1461} \right)^n.

Therefore, since the expected value of an indicator random variable is the probability that the event happens, we have

E(I_k) = \displaystyle \left( \frac{1457}{1461} \right)^n

for k = 1, \dots, 365. Similarly,

E(I_{366}) = \displaystyle \left( \frac{1460}{1461} \right)^n,

so that

\hbox{Cov}(I_k,I_{366}) = E(I_k I_{366}) - \displaystyle \left( \frac{1457}{1461} \right)^n \left( \frac{1460}{1461} \right)^n.

To find E(I_k I_{366}), we note that since I_k is equal to either 0 or 1 and I_{366} is equal to either 0 or 1, the product I_k I_{366} can only equal 0 and 1 as well. Therefore, I_k I_{366} is itself an indicator random variable. Furthermore, I_k I_{366} = 1 if and only if I_k = 1 and I_{366} = 1, which means that no friends has a birthday on either day k or day 366 (that is, February 29). The chance that someone doesn’t have a birthday on day k or February 29 is

\displaystyle 1 - \frac{4}{1461} - \frac{1}{1461} = \displaystyle \frac{1456}{1461},

so that the probability that no friend has a birthday on day k or February 29 is

\displaystyle \left( \frac{1456}{1461} \right)^n.

Therefore, as before,

E(I_k I_{366}) = \displaystyle \left( \frac{1456}{1461} \right)^n,

so that

\hbox{Cov}(I_k,I_{366}) = \displaystyle \left( \frac{1456}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^n \left( \frac{1460}{1461} \right)^n.

Therefore,

\hbox{Var}(N) = 365\displaystyle \left( \frac{1457}{1461} \right)^n \left[ 1 - \left( \frac{1457}{1461} \right)^n \right] + \displaystyle \left( \frac{1460}{1461} \right)^n \left[ 1 - \left( \frac{1460}{1461} \right)^n \right]

+ \displaystyle (365)(364) \left[ \displaystyle \left( \frac{1453}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^{2n} \right] + 2(365) \left[ \left( \frac{1456}{1461} \right)^n - \displaystyle \left( \frac{1457}{1461} \right)^n \left( \frac{1460}{1461} \right)^n \right],

and we find the standard deviation of N using

\hbox{SD}(N) = \sqrt{\hbox{Var}(N)}.

The graph below shows the expected value of N, which was shown earlier to be

E(N) = 365 \displaystyle \left( \frac{1457}{1461} \right)^n + \left( \frac{1460}{1461} \right)^n,

along with error bars representing two standard deviations.

Interestingly, the standard deviation of N changes for different values of n; a direct calculation shows that the \hbox{SD}(N) is maximized at n = 459 with maximum value of approximately 6.1. Accordingly, for n = 450 and n = 500, the error bars in the above figure have a total width of approximately 24 days (two standard deviations both above and below the expected value).

Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: