8 Two means comprison test
8.1 What is-it?
The two-sample mean comparison is based on the assumption that the data in each group or sample are independent and approximately follow a normal distribution.
It helps to assess whether the observed difference in means between the groups is statistically significant or if it could be due to random sampling variability.
8.2 Data
\(\left(x_{i,k}\right)_{i=1}^{n_k}\), where \(k=1,2\) indicates the sample
Assume that the data are realizations of two independent samples, each being independent and identically distributed:
- \(X_{i,1}, i=1,\cdots,n_1\)
- \(X_{i,2}, i=1,\cdots,n_2\)
Expectations: \(\mu_k=\mathbb{E}\left[X_{i,k}\right]\)
Population variances: \(\sigma_k^2=\mathbb{V}ar\left(X_{i,k}\right)\)
Statistics
- Sample means: \(\overline{X}_k=\dfrac{1}{n_k}\sum_{i=1}^{n_k}X_{i,k}\),
- Samples variances: \(S_k^2=\dfrac{1}{n_k}\sum_{i=1}^{n_k}\left(X_{i,k}-\overline{X}_k\right)^2\)
8.3 Hypothesis
8.3.1 Null Hypothesis \(\mathcal{H}_0\)
- \(\mu_1 = \mu_2\)
8.3.2 Alternative Hypothesis \(\mathcal{H}_1\)
Two-tailed test: \(\mu_1\neq \mu_2\)
Left-tailed test: \(\mu_1< \mu_2\)
Right-tailed test: \(\mu_1> \mu_2\)
8.4 Test statistic
- The test statistic is constructed around \(\overline{X}_1-\overline{X}_2\)
8.4.1 Variance of \(\overline{X}_1-\overline{X}_2\)
Several cases are distinguished:
Known variances: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{\sigma_1^2}{n_1}+\dfrac{\sigma_2^2}{n_2}\)
Unknown and equal variances, and large samples sizes: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{n_1S_1^2+n_2S_2^2}{n_1+n_2}\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)\)
Unknown and unequal variances, and large sample sizes: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{S_1^2}{n_1}+\dfrac{S_2^2}{n_2}\)
Unknwon and equal variances, gaussian samples of small sizes: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{n_1S_1^2+n_2S_2^2}{n_1+n_2-2}\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)\)
Unknwon and unequal variances, gaussian samples of small sizes: ‘\(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{S_1^2}{n_1}+\dfrac{S_2^2}{n_2}\)’ and ‘\(dof=\dfrac{S^4\left(\overline{X}_1-\overline{X}_2\right)}{\dfrac{S_1^4}{n_1^2(n_1-1)}+\dfrac{S_2^4}{n_2^2(n_2-1)}}\)’
8.4.2 Statistic
\(T=\dfrac{\overline{X}_1-\overline{X}_2}{\sqrt{S^2\left(\overline{X}_1-\overline{X}_2\right)}}\)
Asymptotic or exact distribution of the test statistic under \(\mathcal{H}_0\):
- Unknwon and unequal variances, gaussian samples of small sizes: \(\mathcal{T}_{dof}\)
- Otherwise: \(\mathcal{N}\left(0, 1\right)\)
- In the following, we denote: > - \(\mathcal{L}\) as the exact or asymptotic distribution of the test statistic > - \(q_{\alpha}\) as \(\alpha\)-level quantile of \(\mathcal{L}\)
8.5 Critical region and P-value
8.5.1 Critical region
Two-tailed test: \(W=\left]-\infty, q_{\alpha/2}\right[\cup\left]q_{1-\alpha/2}, +\infty\right[\)
Left-tailed test: \(W=\left]-\infty, q_{\alpha}\right[\)
Right-tailed test: \(W=\left]q_{1-\alpha}, +\infty\right[\)
8.5.2 P-Value
Two-tailed test: \(pValue=2\mathbb{P}\left(T>|T_{obs}|\right)\)
Left-tailed test: \(pValue=\mathbb{P}\left(T<T_{obs}\right)\)
Right-tailed test: \(pValue=\mathbb{P}\left(T>T_{obs}\right)\)
8.6 Decision
8.6.1 Based on critical region
- Reject \(\mathcal{H}_0\) if and only if \(T_{obs}\in W\)
8.6.2 Based on \(pValue\)
- Reject \(\mathcal{H}_0\) if and only if \(pValue < \alpha\)