# A T-shirt describing hypothesis testing

Now that Christmas is over, I can safely share the Christmas gifts that I gave to my family this year thanks to Nausicaa Distribution (https://www.etsy.com/shop/NausicaaDistribution):

Euler’s equation pencil pouch:

Box-and-whisker snowflakes to hang on our Christmas tree:

And, for me, a wonderfully and subtly punny “Confidence and Power” T-shirt.

Thanks to FiveThirtyEight (see http://fivethirtyeight.com/features/the-fivethirtyeight-2014-holiday-gift-guide/) for pointing me in this direction.

For the sake of completeness, here are the math-oriented gifts that I received for Christmas:

http://www.amazon.com/What-If-Scientific-Hypothetical-Questions/dp/0544272994/ref=sr_1_1?ie=UTF8&qid=1419546608&sr=8-1&keywords=what+if

# Null hypothesis

Source: http://xkcd.com/892/

# Rejection regions

Sage words of wisdom that I gave one day in my statistics class:

If the alternative hypothesis has the form $p > p_0$, then the rejection region lies to the right of $p_0$. On the other hand, if the alternative hypothesis has the form $p < p_0$, then the rejection region lies to the left of $p_0$.

On the other hand, if the alternative hypothesis has the form $p \ne p_0$, then the rejection region has two parts: one part to the left of $p_0$, and another part to the right. So it’s kind of like my single days. Back then, my rejection region had two parts: Friday night and Saturday night.

# Welch’s formula

When conducting an hypothesis test or computing a confidence interval for the difference $\overline{X}_1 - \overline{X}_2$ of two means, where at least one mean does not arise from a small sample, the Student t distribution must be employed. In particular, the number of degrees of freedom for the Student t distribution must be computed. Many textbooks suggest using Welch’s formula:

$df = \frac{\displaystyle (SE_1^2 + SE_2^2)^2}{\displaystyle \frac{SE_1^4}{n_1-1} + \frac{SE_2^4}{n_2-1}},$

rounded down to the nearest integer. In this formula, $SE_1 = \displaystyle \frac{\sigma_1}{\sqrt{n_1}}$ is the standard error associated with the first average $\overline{X}_1$, where $\sigma_1$ (if known) is the population standard deviation for $X$ and $n_1$ is the number of samples that are averaged to find $\overline{X}_1$. In practice, $\sigma_1$ is not known, and so the bootstrap estimate $\sigma_1 \approx s_1$ is employed.

The terms $SE_2$ and $n_2$ are similarly defined for the average $\overline{X}_2$.

In Welch’s formula, the term $SE_1^2 + SE_2^2$ in the numerator is equal to $\displaystyle \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}$. This is the square of the standard error $SE_D$ associated with the difference $\overline{X}_1 - \overline{X}_2$, since

$SE_D = \displaystyle \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$.

This leads to the “Pythagorean” relationship

$SE_1^2 + SE_2^2 = SE_D^2$,

which (in my experience) is a reasonable aid to help students remember the formula for $SE_D$.

Naturally, a big problem that students encounter when using Welch’s formula is that the formula is really, really complicated, and it’s easy to make a mistake when entering information into their calculators. (Indeed, it might be that the pre-programmed calculator function simply gives the wrong answer.) Also, since the formula is complicated, students don’t have a lot of psychological reassurance that, when they come out the other end, their answer is actually correct. So, when teaching this topic, I tell my students the following rule of thumb so that they can at least check if their final answer is plausible:

$\min(n_1,n_2)-1 \le df \le n_1 + n_2 -2$.

To my surprise, I have never seen this formula in a statistics textbook, even though it’s quite simple to state and not too difficult to prove using techniques from first-semester calculus.

Let’s rewrite Welch’s formula as

$df = \left( \displaystyle \frac{1}{n_1-1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{n_2-1} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}$

For the sake of simplicity, let $m_1 = n_1 - 1$ and $m_2 = n_2 -1$, so that

$df = \left( \displaystyle \frac{1}{m_1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{m_2} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}$

Now let $x = \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2}$. All of these terms are nonnegative (and, in practice, they’re all positive), so that $x \ge 0$. Also, the numerator is no larger than the denominator, so that $x \le 1$. Finally, we notice that

$1-x = 1 - \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2} = \frac{SE_2^2}{SE_1^2 + SE_2^2}$.

Using these observations, Welch’s formula reduces to the function

$f(x) = \left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-1}$,

and the central problem is to find the maximum and minimum values of $f(x)$ on the interval $0 \le x \le 1$. Since $f(x)$ is differentiable on $[0,1]$, the absolute extrema can be found by checking the endpoints and the critical point(s).

First, the endpoints. If $x=0$, then $f(0) = \left( \displaystyle \frac{1}{m_2} \right)^{-1} = m_2$. On the other hand, if $x=1$, then $f(1) = \left( \displaystyle \frac{1}{m_1} \right)^{-1} = m_1$.

Next, the critical point(s). These are found by solving the equation $f'(x) = 0$:

$f'(x) = -\left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-2} \left[ \displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} \right] = 0$

$\displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} = 0$

$\displaystyle \frac{2x}{m_1} = \frac{2(1-x)}{m_2}$

$xm_2= (1-x)m_1$

$xm_2 = m_1 - xm_1$

$x(m_1 + m_2) = m_1$

$x = \displaystyle \frac{m_1}{m_1 + m_2}$

Plugging back into the original equation, we find the local extremum

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[1-\frac{m_1}{m_1+m_2}\right]^2 \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[\frac{m_2}{m_1+m_2}\right]^2 \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1}{(m_1+m_2)^2} + \frac{m_2}{(m_1+m_2)^2} \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1+m_2}{(m_1+m_2)^2} \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1+m_2} \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = m_1+m_2$

Based on the three local extrema that we’ve found, it’s clear that the absolute minimum of $f(x)$ on $[0,1]$ is the smaller of $m_1$ and $m_2$, while the absolute maximum is equal to $m_1 + m_2$.

$\hbox{QED}$

In conclusion, I suggest offering the following guidelines to students to encourage their intuition about the plausibility of their answers:

• If $SE_1$ is much smaller than $SE_2$ (i.e., $x \approx 0$), then $df$ will be close to $m_2 = n_2 - 1$.
• If $SE_1$ is much larger than $SE_2$ (i.e., $x \approx 1$), then $df$ will be close to $m_1 = n_1 - 1$.
• Otherwise, $df$ could be as large as $m_1 + m_2 = n_1 + n_2 - 2$, but no larger.

# Statistical significance

When teaching my Applied Statistics class, I’ll often use the following xkcd comic to reinforce the meaning of statistical significance.

The idea that’s being communicated is that, when performing an hypothesis test, the observed significance level $P$ is the probability that the null hypothesis is correct due to dumb luck as opposed to a real effect (the alternative hypothesis). So if the significance level is really about $0.05$ and the experiment is repeated about 20 times, it wouldn’t be surprising for one of those experiments to falsely reject the null hypothesis.

In practice, statisticians use the Bonferroni correction when performing multiple simultaneous tests to avoid the erroneous conclusion displayed in the comic.

Source: http://www.xkcd.com/882/