Welch’s formula

When conducting an hypothesis test or computing a confidence interval for the difference \overline{X}_1 - \overline{X}_2 of two means, where at least one mean does not arise from a small sample, the Student t distribution must be employed. In particular, the number of degrees of freedom for the Student t distribution must be computed. Many textbooks suggest using Welch’s formula:

df = \frac{\displaystyle (SE_1^2 + SE_2^2)^2}{\displaystyle \frac{SE_1^4}{n_1-1} + \frac{SE_2^4}{n_2-1}},

rounded down to the nearest integer. In this formula, SE_1 = \displaystyle \frac{\sigma_1}{\sqrt{n_1}} is the standard error associated with the first average \overline{X}_1, where \sigma_1 (if known) is the population standard deviation for X and n_1 is the number of samples that are averaged to find \overline{X}_1. In practice, \sigma_1 is not known, and so the bootstrap estimate \sigma_1 \approx s_1 is employed.

The terms SE_2 and n_2 are similarly defined for the average \overline{X}_2.

In Welch’s formula, the term SE_1^2 + SE_2^2 in the numerator is equal to \displaystyle \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}. This is the square of the standard error SE_D associated with the difference \overline{X}_1 - \overline{X}_2, since

SE_D = \displaystyle \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}.

This leads to the “Pythagorean” relationship

SE_1^2 + SE_2^2 = SE_D^2,

which (in my experience) is a reasonable aid to help students remember the formula for SE_D.

green line

Naturally, a big problem that students encounter when using Welch’s formula is that the formula is really, really complicated, and it’s easy to make a mistake when entering information into their calculators. (Indeed, it might be that the pre-programmed calculator function simply gives the wrong answer.) Also, since the formula is complicated, students don’t have a lot of psychological reassurance that, when they come out the other end, their answer is actually correct. So, when teaching this topic, I tell my students the following rule of thumb so that they can at least check if their final answer is plausible:

\min(n_1,n_2)-1 \le df \le n_1 + n_2 -2.

To my surprise, I have never seen this formula in a statistics textbook, even though it’s quite simple to state and not too difficult to prove using techniques from first-semester calculus.

Let’s rewrite Welch’s formula as

df = \left( \displaystyle \frac{1}{n_1-1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{n_2-1} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}

For the sake of simplicity, let m_1 = n_1 - 1 and m_2 = n_2 -1, so that

df = \left( \displaystyle \frac{1}{m_1} \left[ \frac{SE_1^2}{SE_1^2 + SE_2^2}\right]^2 + \frac{1}{m_2} \left[ \frac{SE_2^2}{SE_1^2 + SE_2^2} \right]^2 \right)^{-1}

Now let x = \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2}. All of these terms are nonnegative (and, in practice, they’re all positive), so that x \ge 0. Also, the numerator is no larger than the denominator, so that x \le 1. Finally, we notice that

1-x = 1 - \displaystyle \frac{SE_1^2}{SE_1^2 + SE_2^2} = \frac{SE_2^2}{SE_1^2 + SE_2^2}.

Using these observations, Welch’s formula reduces to the function

f(x) = \left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-1},

and the central problem is to find the maximum and minimum values of f(x) on the interval 0 \le x \le 1. Since f(x) is differentiable on [0,1], the absolute extrema can be found by checking the endpoints and the critical point(s).

First, the endpoints. If x=0, then f(0) = \left( \displaystyle \frac{1}{m_2} \right)^{-1} = m_2. On the other hand, if x=1, then f(1) = \left( \displaystyle \frac{1}{m_1} \right)^{-1} = m_1.

Next, the critical point(s). These are found by solving the equation f'(x) = 0:

f'(x) = -\left( \displaystyle \frac{x^2}{m_1} + \frac{(1-x)^2}{m_2} \right)^{-2} \left[ \displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} \right] = 0

\displaystyle \frac{2x}{m_1} - \frac{2(1-x)}{m_2} = 0

\displaystyle \frac{2x}{m_1} = \frac{2(1-x)}{m_2}

xm_2= (1-x)m_1

xm_2 = m_1 - xm_1

x(m_1 + m_2) = m_1

x = \displaystyle \frac{m_1}{m_1 + m_2}

Plugging back into the original equation, we find the local extremum

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[1-\frac{m_1}{m_1+m_2}\right]^2 \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1} \frac{m_1^2}{(m_1+m_2)^2} + \frac{1}{m_2} \left[\frac{m_2}{m_1+m_2}\right]^2 \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1}{(m_1+m_2)^2} + \frac{m_2}{(m_1+m_2)^2} \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{m_1+m_2}{(m_1+m_2)^2} \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = \left( \displaystyle \frac{1}{m_1+m_2} \right)^{-1}

f \left( \displaystyle \frac{m_1}{m_1+m_2} \right) = m_1+m_2

Based on the three local extrema that we’ve found, it’s clear that the absolute minimum of f(x) on [0,1] is the smaller of m_1 and m_2, while the absolute maximum is equal to m_1 + m_2.


In conclusion, I suggest offering the following guidelines to students to encourage their intuition about the plausibility of their answers:

  • If SE_1 is much smaller than SE_2 (i.e., x \approx 0), then df will be close to m_2 = n_2 - 1.
  • If SE_1 is much larger than SE_2 (i.e., x \approx 1), then df will be close to m_1 = n_1 - 1.
  • Otherwise, df could be as large as m_1 + m_2 = n_1 + n_2 - 2, but no larger.