One way anova
What is-it?
ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or more independent groups or samples simultaneously.
ANOVA assesses whether there are statistically significant differences in the means of the groups, allowing for the identification of any significant group effects.
ANOVA divides the total variation in the data into two components: the variation between the groups and the variation within the groups.
It then compares the magnitude of these two components to determine if the differences between the group means are larger than what would be expected due to random variation.
Data
Sample indicator: \(k=1,\cdots,K\)
Let \(x_{i,k}\) be the observed data on individual \(i\) in sample \(k\)
Let \(n_k\) be the size iof sample \(k\)
Total sample size: \(n=\sum_kn_k\)
Samples are assumed to be independent
Assume that data of sample \(k\) are modelled by i.i.d. random variables \(X_{i,k}\), \(k=1,\cdots,K\)
Let \(\mu_k=\mathbb{E}\left[X_{i,k}\right]\)
Hypothesis
\(\mathcal{H}_0\): \(\mu_1=\cdots=\mu_k\cdots=\mu_K\)
\(\mathcal{H}_1\): \(\exists k,l;\ \mu_k\neq\mu_l\)
Statistics
Means
Mean of group \(k\): \(\overline{X}_k=\dfrac{1}{n_k}\sum_{i=1}^{n_k}X_{i, k}\)
Overall mean: \(\overline{X}=\dfrac{1}{n}\sum_{k=1}^Kn_k\overline{X}_k\)
Variances
- Within variance of groupe \(k\): \(S_k^2=\dfrac{1}{n_k}\sum_{i=1}^{n_k}\left(X_{i,k}-\overline{X}_k\right)^2\)
Sums of squares
Within sum of squares in group \(k\): \(WSS_k=\sum_{i=1}^{n_k}\left(X_{i,k}-\overline{X}_k\right)^2\)
Within sum of squares: \(WSS=\sum_{k=1}^K\sum_{i=1}^{n_k}\left(X_{i,k}-\overline{X}_k\right)^2\) \(=\sum_{k=1}^K WSS_k\)
Between sum of squares: \(BSS=\sum_{k=1}^Kn_k\left(\overline{X}_k-\overline{X}\right)^2\)
Total sum of squares: \(TSS=\sum_{k=1}^K\sum_{i=1}^n\left(X_{i,k}-\overline{X}\right)^2\) \(=WSS+BSS\)
Test statistic
Between-group variance: \(BVar=\dfrac{BSS}{K-1}\)
Within-group variance: \(WVar=\dfrac{WSS}{n-K}\)
Test statistic: \(F=\dfrac{BVar}{WVar}\overset{\mathcal{H}_0}{\sim}\mathcal{F}_{K-1,n-K}\)
Observed statistic: \(F_{obs}\)
Between degre-of-freedom: \(bdof = K-1\)
Within degre-of-freedom: \(wdof = n-K\)
Total degre-of-freedom: \(tdof=n-1\)
Source | SS | dof | Variance | F-stat | p-value |
---|---|---|---|---|---|
Between | BSS | bdof | \(BVar\) | \(F_{obs}\) | p-value |
Within | WSS | wdof | \(WVar\) | ||
Total | TSS | tdof |
Critical region and P-value
\(W=\left(q_{1-\alpha}\left(\mathcal{F}_{K-1, n-K}\right),\ +\infty\right)\)
\(pValue=\mathbb{P}\left(F>F_{obs}\right)\)
Decision
Based on critical region
- Reject \(\mathcal{H}_0\) if and only if \(F_{obs}\in W\)
Based on \(pValue\)
- Reject \(\mathcal{H}_0\) if and only if \(pValue < \alpha\)