# Welch’s formula

When conducting an hypothesis test or computing a confidence interval for the difference $\overline{X}_1 - \overline{X}_2$ of two means, where at least one mean does not arise from a small sample, the Student t distribution must be employed. In particular, the number of degrees of freedom for the Student t distribution must be computed. Many textbooks suggest using Welch’s formula:

$df = \frac{\displaystyle (SE_1^2 + SE_2^2)^2}{\displaystyle \frac{SE_1^4}{n_1-1} + \frac{SE_2^4}{n_2-1}},$

rounded down to the nearest integer. In this formula, $SE_1 = \displaystyle \frac{\sigma_1}{\sqrt{n_1}}$ is the standard error associated with the first average $\overline{X}_1$, where $\sigma_1$ (if known) is the population standard deviation for $X$ and $n_1$ is the number of samples that are averaged to find $\overline{X}_1$. In practice, $\sigma_1$ is not known, and so the bootstrap estimate $\sigma_1 \approx s_1$ is employed.

The terms $SE_2$ and $n_2$ are similarly defined for the average $\overline{X}_2$.

In Welch’s formula, the term $SE_1^2 + SE_2^2$ in the numerator is equal to $\displaystyle \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}$. This is the square of the standard error $SE_D$ associated with the difference $\overline{X}_1 - \overline{X}_2$, since

$SE_D = \displaystyle \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$.

This leads to the “Pythagorean” relationship

$SE_1^2 + SE_2^2 = SE_D^2$,

which (in my experience) is a reasonable aid to help students remember the formula for $SE_D$.

Naturally, a big problem that students encounter when using Welch’s formula is that the formula is really, really complicated, and it’s easy to make a mistake when entering information into their calculators. (Indeed, it might be that the pre-programmed calculator function simply gives the wrong answer.) Also, since the formula is complicated, students don’t have a lot of psychological reassurance that, when they come out the other end, their answer is actually correct. So, when teaching this topic, I tell my students the following rule of thumb so that they can at least check if their final answer is plausible:

$\min(n_1,n_2)-1 \le df \le n_1 + n_2 -2$.

To my surprise, I have never seen this formula in a statistics textbook, even though it’s quite simple to state and not too difficult to prove using techniques from first-semester calculus.

Let’s rewrite Welch’s formula as

$df = \left( \displaystyle \frac{1}{n_1-1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{n_2-1} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}$

For the sake of simplicity, let $m_1 = n_1 - 1$ and $m_2 = n_2 -1$, so that

$df = \left( \displaystyle \frac{1}{m_1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{m_2} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}$

Now let $x = \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2}$. All of these terms are nonnegative (and, in practice, they’re all positive), so that $x \ge 0$. Also, the numerator is no larger than the denominator, so that $x \le 1$. Finally, we notice that

$1-x = 1 - \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2} = \frac{SE_2^2}{SE_1^2 + SE_2^2}$.

Using these observations, Welch’s formula reduces to the function

$f(x) = \left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-1}$,

and the central problem is to find the maximum and minimum values of $f(x)$ on the interval $0 \le x \le 1$. Since $f(x)$ is differentiable on $[0,1]$, the absolute extrema can be found by checking the endpoints and the critical point(s).

First, the endpoints. If $x=0$, then $f(0) = \left( \displaystyle \frac{1}{m_2} \right)^{-1} = m_2$. On the other hand, if $x=1$, then $f(1) = \left( \displaystyle \frac{1}{m_1} \right)^{-1} = m_1$.

Next, the critical point(s). These are found by solving the equation $f'(x) = 0$:

$f'(x) = -\left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-2} \left[ \displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} \right] = 0$

$\displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} = 0$

$\displaystyle \frac{2x}{m_1} = \frac{2(1-x)}{m_2}$

$xm_2= (1-x)m_1$

$xm_2 = m_1 - xm_1$

$x(m_1 + m_2) = m_1$

$x = \displaystyle \frac{m_1}{m_1 + m_2}$

Plugging back into the original equation, we find the local extremum

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[1-\frac{m_1}{m_1+m_2}\right]^2 \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[\frac{m_2}{m_1+m_2}\right]^2 \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1}{(m_1+m_2)^2} + \frac{m_2}{(m_1+m_2)^2} \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1+m_2}{(m_1+m_2)^2} \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1+m_2} \right)^{-1}$

$f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = m_1+m_2$

Based on the three local extrema that we’ve found, it’s clear that the absolute minimum of $f(x)$ on $[0,1]$ is the smaller of $m_1$ and $m_2$, while the absolute maximum is equal to $m_1 + m_2$.

$\hbox{QED}$

In conclusion, I suggest offering the following guidelines to students to encourage their intuition about the plausibility of their answers:

• If $SE_1$ is much smaller than $SE_2$ (i.e., $x \approx 0$), then $df$ will be close to $m_2 = n_2 - 1$.
• If $SE_1$ is much larger than $SE_2$ (i.e., $x \approx 1$), then $df$ will be close to $m_1 = n_1 - 1$.
• Otherwise, $df$ could be as large as $m_1 + m_2 = n_1 + n_2 - 2$, but no larger.