Chi squared test of independance

What it is?

The chi-square test of independence is a statistical hypothesis test used to determine whether there is a significant association between two categorical variables.
It assesses whether the observed frequencies of the variables in a contingency table differ significantly from the frequencies that would be expected if the variables were independent.
Let \(X\) and \(Y\) denote the variables in question, and let \(\left\{x_1,\cdots,x_k,\cdots,x_K\right\}\) and \(\left\{y_1,\cdots,y_l,\cdots,y_L\right\}\) be their respective categories.

Observed frequency of \((x_k,y_l)\): \(n_{k,l}\)
Marginal frequency of category \(x_k\): \(n_{k,+}=\sum_{l=1}^Ln_{k,l}\)
Marginal frequency of category \(y_l\): \(n_{+,l}=\sum_{k=1}^Kn_{k,l}\)
Total frequency: \[ \begin{aligned} n &=\sum_{k=1}^K\sum_{l=1}^Ln_{k,l}\\ &=\sum_{k=1}^Kn_{k,+}\\ &=\sum_{l=1}^Ln_{+,l}\\ \end{aligned} \]

Contingency table

	\(y_1\)	\(\cdots\)	\(y_l\)	\(\cdots\)	\(y_L\)	Total
\(x_1\)	\(n_{1,1}\)	\(\cdots\)	\(n_{1,l}\)	\(\cdots\)	\(n_{1,L}\)	\(n_{1,+}\)
\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\ddots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)
\(x_k\)	\(n_{k,1}\)	\(\cdots\)	\(n_{k,l}\)	\(\cdots\)	\(n_{k,L}\)	\(n_{k,+}\)
\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\ddots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)
\(x_K\)	\(n_{K,1}\)	\(\cdots\)	\(n_{K,l}\)	\(\cdots\)	\(n_{K,L}\)	\(n_{K,+}\)
Total	\(n_{+,1}\)	\(\cdots\)	\(n_{+,l}\)	\(\cdots\)	\(n_{+,L}\)	\(n\)

Expected Frequencies under \(\mathcal{H}_0\) \[ \widehat{n}_{k,l}=n*\dfrac{n_{k,\cdot}}{n}\dfrac{n_{\cdot,l}}{n}=\dfrac{n_{k,\cdot}n_{\cdot,l}}{n} \]
Test statistic \[ \mathbb{X}^2=\sum_k\sum_l\dfrac{\left(n_{k,l}-\widehat{n}_{k,l}\right)^2}{\widehat{n}_{k,l}}\ \overset{\mathcal{H}_0}{\rightarrow}\ \mathcal{X}_{df}^2 \]
Degrees of freedom: \(df = (K-1)(L-1)\)