Testing the hypothesis

Unfortunately, the scientist is not able to perform the same experiment at the same time on all people. She must instead draw a small set of people from the population and make a determination about whether the hypothesis is true. Let the index $ j$ refer to a particular chosen subject, and let $ y[j]$ be his or her response for the experiment; each subject's response is a dependent variable. Two statistics are important for combining information from the dependent variables: The mean,

$\displaystyle \hat{\mu}= \frac{1}{n} \sum_{j=1}^n y[j] ,$ (12.3)

which is simply the average of $ y[j]$ over the subjects, and the variance, which is

$\displaystyle \hat{\sigma}^2 = \frac{1}{n} \sum_{j=1}^n (y[j] - \hat{\mu})^2 .$ (12.4)

The variance estimate (12.4) is considered to be a biased estimator for the ``true'' variance; therefore, Bessel's correction is sometimes applied, which places $ n-1$ into the denominator instead of $ n$, resulting in an unbiased estimator.

Figure 12.5: Student's t distribution: (a) probability density function (pdf); (b) cumulative distribution function (cdf). In the figures, $ \nu $ is called the degrees of freedom, and $ \nu = n-1$ for the number of subjects $ n$. When $ \nu $ is small, the pdf has larger tails than the normal distribution; however, in the limit as $ \nu $ approaches $ \infty $, the Student t distribution converges to the normal distribution. (Figures by Wikipedia user skbkekas.)
...nttcdf.ps,width=2.8truein} \\
(a) & (b)

To test the hypothesis, Student's t-distribution (``Student'' was William Sealy Gosset) is widely used, which is a probability distribution that captures how the mean $ \mu$ is distributed if $ n$ subjects are chosen at random and their responses $ y[j]$ are averaged; see Figure 12.5. This assumes that the response $ y[j]$ for each individual $ j$ is a normal distribution (called Gaussian distribution in engineering), which is the most basic and common probability distribution. It is fully characterized in terms of its mean $ \mu$ and standard deviation $ \sigma$. The exact expressions for these distributions are not given here, but are widely available; see [125] and other books on mathematical statistics for these and many more.

The Student's t test [319] involves calculating the following:

$\displaystyle t = {\hat{\mu}_1 - \hat{\mu}_2 \over \hat{\sigma}_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} ,$ (12.5)

in which

$\displaystyle \hat{\sigma}_p = \sqrt{(n_1 - 1) \hat{\sigma}_1^2 + (n_2 - 1) \hat{\sigma}_2^2 \over n_1 + n_2 - 2}$ (12.6)

and $ n_i$ is the number of subjects who received treatment $ x_i$. The subtractions by $ 1$ and $ 2$ in the expressions are due to Bessel's correction. Based on the value of $ t$, the confidence $ \alpha$ in the null hypothesis $ H_0$ is determined by looking in a table of the Student's t cdf (Figure 12.5(b)). Typically, $ \alpha = 0.05$ or lower is sufficient to declare that $ H_1$ is true (corresponding to 95% confidence). Such tables are usually arranged so that for a given $ \nu $ and $ \alpha$ is, the minimum $ t$ value needed to confirm $ H_1$ with confidence $ 1-\alpha$ is presented. Note that if $ t$ is negative, then the effect that $ x$ has on $ y$ runs in the opposite direction, and $ -t$ is applied to the table.

The binary outcome might not be satisfying enough. This is not a problem because difference in means, $ \hat{\mu}_1 - \hat{\mu}_2$, is an estimate of the amount of change that applying $ x_2$ had in comparison to $ x_1$. This is called the average treatment effect. Thus, in addition to determining whether the $ H_1$ is true via the t-test, we also obtain an estimate of how much it affects the outcome.

Student's t-test assumed that the variance within each group is identical. If it is not, then Welch's t-test is used [350]. Note that the variances were not given in advance in either case. They are estimated ``on the fly'' from the experimental data. Welch's t-test gives the same result as Student's t-test if the variances happen to be the same; therefore, when in doubt, it may be best to apply Welch's t-test. Many other tests can be used and are debated in particular contexts by scientists; see [125].

Steven M LaValle 2020-01-06