Two means comprison test
What is-it?
The two-sample mean comparison is based on the assumption that the data in each group or sample are independent and approximately follow a normal distribution.
It helps to assess whether the observed difference in means between the groups is statistically significant or if it could be due to random sampling variability.
Data
\(\left(x_{i,k}\right)_{i=1}^{n_k}\), where \(k=1,2\) indicates the sample
Assume that the data are realizations of two independent samples, each being independent and identically distributed:
- \(X_{i,1}, i=1,\cdots,n_1\)
- \(X_{i,2}, i=1,\cdots,n_2\)
Expectations: \(\mu_k=\mathbb{E}\left[X_{i,k}\right]\)
Population variances: \(\sigma_k^2=\mathbb{V}ar\left(X_{i,k}\right)\)
Statistics
- Sample means: \(\overline{X}_k=\dfrac{1}{n_k}\sum_{i=1}^{n_k}X_{i,k}\),
- Samples variances: \(S_k^2=\dfrac{1}{n_k}\sum_{i=1}^{n_k}\left(X_{i,k}-\overline{X}_k\right)^2\)
Hypothesis
Null Hypothesis \(\mathcal{H}_0\)
- \(\mu_1 = \mu_2\)
Alternative Hypothesis \(\mathcal{H}_1\)
Two-tailed test: \(\mu_1\neq \mu_2\)
Left-tailed test: \(\mu_1< \mu_2\)
Right-tailed test: \(\mu_1> \mu_2\)
Test statistic
- The test statistic is constructed around \(\overline{X}_1-\overline{X}_2\)
Variance of \(\overline{X}_1-\overline{X}_2\)
Several cases are distinguished:
Known variances: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{\sigma_1^2}{n_1}+\dfrac{\sigma_2^2}{n_2}\)
Unknown and equal variances, and large samples sizes: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{n_1S_1^2+n_2S_2^2}{n_1+n_2}\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)\)
Unknown and unequal variances, and large sample sizes: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{S_1^2}{n_1}+\dfrac{S_2^2}{n_2}\)
Unknwon and equal variances, gaussian samples of small sizes: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{n_1S_1^2+n_2S_2^2}{n_1+n_2-2}\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)\)
Unknwon and unequal variances, gaussian samples of small sizes: ‘\(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{S_1^2}{n_1}+\dfrac{S_2^2}{n_2}\)’ and ‘\(dof=\dfrac{S^4\left(\overline{X}_1-\overline{X}_2\right)}{\dfrac{S_1^4}{n_1^2(n_1-1)}+\dfrac{S_2^4}{n_2^2(n_2-1)}}\)’
Statistic
\(T=\dfrac{\overline{X}_1-\overline{X}_2}{\sqrt{S^2\left(\overline{X}_1-\overline{X}_2\right)}}\)
Asymptotic or exact distribution of the test statistic under \(\mathcal{H}_0\):
- Unknwon and unequal variances, gaussian samples of small sizes: \(\mathcal{T}_{dof}\)
- Otherwise: \(\mathcal{N}\left(0, 1\right)\)
- In the following, we denote: > - \(\mathcal{L}\) as the exact or asymptotic distribution of the test statistic > - \(q_{\alpha}\) as \(\alpha\)-level quantile of \(\mathcal{L}\)
Critical region and P-value
Critical region
Two-tailed test: \(W=\left]-\infty, q_{\alpha/2}\right[\cup\left]q_{1-\alpha/2}, +\infty\right[\)
Left-tailed test: \(W=\left]-\infty, q_{\alpha}\right[\)
Right-tailed test: \(W=\left]q_{1-\alpha}, +\infty\right[\)
P-Value
Two-tailed test: \(pValue=2\mathbb{P}\left(T>|T_{obs}|\right)\)
Left-tailed test: \(pValue=\mathbb{P}\left(T<T_{obs}\right)\)
Right-tailed test: \(pValue=\mathbb{P}\left(T>T_{obs}\right)\)
Decision
Based on critical region
- Reject \(\mathcal{H}_0\) if and only if \(T_{obs}\in W\)
Based on \(pValue\)
- Reject \(\mathcal{H}_0\) if and only if \(pValue < \alpha\)