Two means comprison test

What is-it?

  • The two-sample mean comparison is based on the assumption that the data in each group or sample are independent and approximately follow a normal distribution.

  • It helps to assess whether the observed difference in means between the groups is statistically significant or if it could be due to random sampling variability.

Data

  • \(\left(x_{i,k}\right)_{i=1}^{n_k}\), where \(k=1,2\) indicates the sample

  • Assume that the data are realizations of two independent samples, each being independent and identically distributed:

    • \(X_{i,1}, i=1,\cdots,n_1\)
    • \(X_{i,2}, i=1,\cdots,n_2\)
  • Expectations: \(\mu_k=\mathbb{E}\left[X_{i,k}\right]\)

  • Population variances: \(\sigma_k^2=\mathbb{V}ar\left(X_{i,k}\right)\)

  • Statistics

    • Sample means: \(\overline{X}_k=\dfrac{1}{n_k}\sum_{i=1}^{n_k}X_{i,k}\),
    • Samples variances: \(S_k^2=\dfrac{1}{n_k}\sum_{i=1}^{n_k}\left(X_{i,k}-\overline{X}_k\right)^2\)

Hypothesis

Null Hypothesis \(\mathcal{H}_0\)

  • \(\mu_1 = \mu_2\)

Alternative Hypothesis \(\mathcal{H}_1\)

  • Two-tailed test: \(\mu_1\neq \mu_2\)

  • Left-tailed test: \(\mu_1< \mu_2\)

  • Right-tailed test: \(\mu_1> \mu_2\)

Test statistic

  • The test statistic is constructed around \(\overline{X}_1-\overline{X}_2\)

Variance of \(\overline{X}_1-\overline{X}_2\)

Several cases are distinguished:

  • Known variances: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{\sigma_1^2}{n_1}+\dfrac{\sigma_2^2}{n_2}\)

  • Unknown and equal variances, and large samples sizes: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{n_1S_1^2+n_2S_2^2}{n_1+n_2}\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)\)

  • Unknown and unequal variances, and large sample sizes: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{S_1^2}{n_1}+\dfrac{S_2^2}{n_2}\)

  • Unknwon and equal variances, gaussian samples of small sizes: \(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{n_1S_1^2+n_2S_2^2}{n_1+n_2-2}\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)\)

  • Unknwon and unequal variances, gaussian samples of small sizes: ‘\(S^2\left(\overline{X}_1-\overline{X}_2\right)=\dfrac{S_1^2}{n_1}+\dfrac{S_2^2}{n_2}\)’ and ‘\(dof=\dfrac{S^4\left(\overline{X}_1-\overline{X}_2\right)}{\dfrac{S_1^4}{n_1^2(n_1-1)}+\dfrac{S_2^4}{n_2^2(n_2-1)}}\)

Statistic

  • \(T=\dfrac{\overline{X}_1-\overline{X}_2}{\sqrt{S^2\left(\overline{X}_1-\overline{X}_2\right)}}\)

  • Asymptotic or exact distribution of the test statistic under \(\mathcal{H}_0\):

  • Unknwon and unequal variances, gaussian samples of small sizes: \(\mathcal{T}_{dof}\)
  • Otherwise: \(\mathcal{N}\left(0, 1\right)\)
  • In the following, we denote: > - \(\mathcal{L}\) as the exact or asymptotic distribution of the test statistic > - \(q_{\alpha}\) as \(\alpha\)-level quantile of \(\mathcal{L}\)

Critical region and P-value

Critical region

  • Two-tailed test: \(W=\left]-\infty, q_{\alpha/2}\right[\cup\left]q_{1-\alpha/2}, +\infty\right[\)

  • Left-tailed test: \(W=\left]-\infty, q_{\alpha}\right[\)

  • Right-tailed test: \(W=\left]q_{1-\alpha}, +\infty\right[\)

P-Value

  • Two-tailed test: \(pValue=2\mathbb{P}\left(T>|T_{obs}|\right)\)

  • Left-tailed test: \(pValue=\mathbb{P}\left(T<T_{obs}\right)\)

  • Right-tailed test: \(pValue=\mathbb{P}\left(T>T_{obs}\right)\)

Decision

Based on critical region

  • Reject \(\mathcal{H}_0\) if and only if \(T_{obs}\in W\)

Based on \(pValue\)

  • Reject \(\mathcal{H}_0\) if and only if \(pValue < \alpha\)