Confidence interval of a mean
What is-it?
A confidence interval of a mean is a statistical range that provides an estimate of where the true population mean is likely to fall. It is a way to quantify the uncertainty associated with estimating the population mean based on a sample.
When we collect a sample from a population and calculate the sample mean, we know that it may not exactly equal the true population mean due to sampling variability. A confidence interval provides a range of values within which we can reasonably expect the population mean to fall.
Data and statistics
\(x_{1:n} = \left(x_1,\cdots, x_i,\cdots,x_n\right)\in\mathbb{R}^n\)
The observations \(x_{1:n}\) are supposed to be iid realizations of random variables \(X_{1:n}:=\left(X_1,\cdots,X_n\right)\)
\(\overline{X}=\dfrac{1}{n}\sum_{i=1}^nX_i\)
\(S^2=\dfrac{1}{n}\sum_{i=1}^n\left(X_i-\overline{X}\right)^2\)
\(\sigma^2=\mathbb{V}ar\left(X_i\right)\)
\(\widehat{\sigma^2}=\dfrac{1}{n-1}\left(X_i-\overline{X}\right)\)
Confidence Interval (CI)
A procedure of CI (or asymptotic CI) at a confidence level \(\beta\) for a population mean \(\mu\) is a random interval \(IC_{\beta}=\left[T_{1,n}, T_{2,n}\right]\), where \(T_{1,n}=T_{1,n}\left(X_{1:n}\right)\) and \(T_{2,n}=T_{2,n}\left(X_{1:n}\right)\) are statistics that satisfy \(\mathbb{P}\left(T_{1,n}\leq\mu\leq T_{2,n}\right)\geq \beta=1-\alpha\) (or \(\lim_{n\rightarrow +\infty}\mathbb{P}\left(T_{1,n}\leq\mu\leq T_{2,n}\right)\geq \beta=1-\alpha\)).
A confidence interval is obtained by replacing the random variables \(X_i\) in the procedure with the observed data \(x_i\).
A bilateral CI procédure of mean is: \(CI_{1-\alpha}\left(\mu\right)=\overline{X}\pm q_{1-\alpha/2}*S\left(\overline{X}\right)\)
Where \(S\left(\overline{X}\right)\) and \(q_{1-\alpha/2}\) are given below
\(T_{1,n}=\overline{X} - q_{1-\alpha/2}*S\left(\overline{X}\right)\)
\(T_{2,n}=\overline{X} + q_{1-\alpha/2}*S\left(\overline{X}\right)\)
Gaussian Sample
- \(X_i\overset{iid}{\sim}\mathcal{N}\left(\mu, \sigma^2\right)\)
Known Variance
Let \(S\left(\overline{X}\right)=\sqrt{\dfrac{\sigma^2}{n}}\)
\(\dfrac{\overline{X}-\mu}{S\left(\overline{X}\right)}\sim\mathcal{N}\left(0, 1\right)\)
\(q_{1-\alpha/2}=q_{1-\alpha/2}\left(\mathcal{N}(0, 1)\right)\)
Unknown variance and small sample size
\(S\left(\overline{X}\right)=\sqrt{\dfrac{nS^2}{n-1}}\)
\(\dfrac{\overline{X}-\mu}{S\left(\overline{X}\right)}\sim\mathcal{T}_{n-1}\)
\(q_{1-\alpha/2}=q_{1-\alpha/2}\left(\mathcal{T}_{n-1}\right)\)
Unknown variance and large sample size
\(S\left(\overline{X}\right)=\sqrt{\dfrac{nS^2}{n-1}}\)
\(\dfrac{\overline{X}-\mu}{S\left(\overline{X}\right)}\rightarrow\mathcal{N}\left(0, 1\right)\)
\(q_{1-\alpha/2}=q_{1-\alpha/2}\left(\mathcal{N}\left(0, 1\right)\right)\)
Large non-gaussian sample
- Typically, a sample is considered to be large when \(n\geq 30\)
Known Variance
\(S\left(\overline{X}\right)=\sqrt{\dfrac{\sigma^2}{n}}\)
\(\dfrac{\overline{X}-\mu}{S\left(\overline{X}\right)}\rightarrow\mathcal{N}\left(0, 1\right)\)
\(q_{1-\alpha/2}=q_{1-\alpha/2}\left(\mathcal{N}(0, 1)\right)\)
Unknown Variance
\(S\left(\overline{X}\right)=\sqrt{\dfrac{S^2}{n}}\)
\(\dfrac{\overline{X}-\mu}{S\left(\overline{X}\right)}\rightarrow\mathcal{N}\left(0, 1\right)\)
\(q_{1-\alpha/2}=q_{1-\alpha/2}\left(\mathcal{N}(0, 1)\right)\)