One way anova

What is-it?

  • ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or more independent groups or samples simultaneously.

  • ANOVA assesses whether there are statistically significant differences in the means of the groups, allowing for the identification of any significant group effects.

  • ANOVA divides the total variation in the data into two components: the variation between the groups and the variation within the groups.

  • It then compares the magnitude of these two components to determine if the differences between the group means are larger than what would be expected due to random variation.

Data

  • Sample indicator: \(k=1,\cdots,K\)

  • Let \(x_{i,k}\) be the observed data on individual \(i\) in sample \(k\)

  • Let \(n_k\) be the size iof sample \(k\)

  • Total sample size: \(n=\sum_kn_k\)

  • Samples are assumed to be independent

  • Assume that data of sample \(k\) are modelled by i.i.d. random variables \(X_{i,k}\), \(k=1,\cdots,K\)

  • Let \(\mu_k=\mathbb{E}\left[X_{i,k}\right]\)

Hypothesis

  • \(\mathcal{H}_0\): \(\mu_1=\cdots=\mu_k\cdots=\mu_K\)

  • \(\mathcal{H}_1\): \(\exists k,l;\ \mu_k\neq\mu_l\)

Statistics

Means

  • Mean of group \(k\): \(\overline{X}_k=\dfrac{1}{n_k}\sum_{i=1}^{n_k}X_{i, k}\)

  • Overall mean: \(\overline{X}=\dfrac{1}{n}\sum_{k=1}^Kn_k\overline{X}_k\)

Variances

  • Within variance of groupe \(k\): \(S_k^2=\dfrac{1}{n_k}\sum_{i=1}^{n_k}\left(X_{i,k}-\overline{X}_k\right)^2\)

Sums of squares

  • Within sum of squares in group \(k\): \(WSS_k=\sum_{i=1}^{n_k}\left(X_{i,k}-\overline{X}_k\right)^2\)

  • Within sum of squares: \(WSS=\sum_{k=1}^K\sum_{i=1}^{n_k}\left(X_{i,k}-\overline{X}_k\right)^2\) \(=\sum_{k=1}^K WSS_k\)

  • Between sum of squares: \(BSS=\sum_{k=1}^Kn_k\left(\overline{X}_k-\overline{X}\right)^2\)

  • Total sum of squares: \(TSS=\sum_{k=1}^K\sum_{i=1}^n\left(X_{i,k}-\overline{X}\right)^2\) \(=WSS+BSS\)

Test statistic

  • Between-group variance: \(BVar=\dfrac{BSS}{K-1}\)

  • Within-group variance: \(WVar=\dfrac{WSS}{n-K}\)

  • Test statistic: \(F=\dfrac{BVar}{WVar}\overset{\mathcal{H}_0}{\sim}\mathcal{F}_{K-1,n-K}\)

  • Observed statistic: \(F_{obs}\)

  • Between degre-of-freedom: \(bdof = K-1\)

  • Within degre-of-freedom: \(wdof = n-K\)

  • Total degre-of-freedom: \(tdof=n-1\)

Source SS dof Variance F-stat p-value
Between BSS bdof \(BVar\) \(F_{obs}\) p-value
Within WSS wdof \(WVar\)
Total TSS tdof

Critical region and P-value

  • \(W=\left(q_{1-\alpha}\left(\mathcal{F}_{K-1, n-K}\right),\ +\infty\right)\)

  • \(pValue=\mathbb{P}\left(F>F_{obs}\right)\)

Decision

Based on critical region

  • Reject \(\mathcal{H}_0\) if and only if \(F_{obs}\in W\)

Based on \(pValue\)

  • Reject \(\mathcal{H}_0\) if and only if \(pValue < \alpha\)