Confidence interval of a mean

What is-it?

  • A confidence interval of a mean is a statistical range that provides an estimate of where the true population mean is likely to fall. It is a way to quantify the uncertainty associated with estimating the population mean based on a sample.

  • When we collect a sample from a population and calculate the sample mean, we know that it may not exactly equal the true population mean due to sampling variability. A confidence interval provides a range of values within which we can reasonably expect the population mean to fall.

Data and statistics

  • \(x_{1:n} = \left(x_1,\cdots, x_i,\cdots,x_n\right)\in\mathbb{R}^n\)

  • The observations \(x_{1:n}\) are supposed to be iid realizations of random variables \(X_{1:n}:=\left(X_1,\cdots,X_n\right)\)

  • \(\overline{X}=\dfrac{1}{n}\sum_{i=1}^nX_i\)

  • \(S^2=\dfrac{1}{n}\sum_{i=1}^n\left(X_i-\overline{X}\right)^2\)

  • \(\sigma^2=\mathbb{V}ar\left(X_i\right)\)

  • \(\widehat{\sigma^2}=\dfrac{1}{n-1}\left(X_i-\overline{X}\right)\)

Confidence Interval (CI)

  • A procedure of CI (or asymptotic CI) at a confidence level \(\beta\) for a population mean \(\mu\) is a random interval \(IC_{\beta}=\left[T_{1,n}, T_{2,n}\right]\), where \(T_{1,n}=T_{1,n}\left(X_{1:n}\right)\) and \(T_{2,n}=T_{2,n}\left(X_{1:n}\right)\) are statistics that satisfy \(\mathbb{P}\left(T_{1,n}\leq\mu\leq T_{2,n}\right)\geq \beta=1-\alpha\) (or \(\lim_{n\rightarrow +\infty}\mathbb{P}\left(T_{1,n}\leq\mu\leq T_{2,n}\right)\geq \beta=1-\alpha\)).

  • A confidence interval is obtained by replacing the random variables \(X_i\) in the procedure with the observed data \(x_i\).

  • A bilateral CI procédure of mean is: \(CI_{1-\alpha}\left(\mu\right)=\overline{X}\pm q_{1-\alpha/2}*S\left(\overline{X}\right)\)

  • Where \(S\left(\overline{X}\right)\) and \(q_{1-\alpha/2}\) are given below

  • \(T_{1,n}=\overline{X} - q_{1-\alpha/2}*S\left(\overline{X}\right)\)

  • \(T_{2,n}=\overline{X} + q_{1-\alpha/2}*S\left(\overline{X}\right)\)

Gaussian Sample

  • \(X_i\overset{iid}{\sim}\mathcal{N}\left(\mu, \sigma^2\right)\)

Known Variance

  • Let \(S\left(\overline{X}\right)=\sqrt{\dfrac{\sigma^2}{n}}\)

  • \(\dfrac{\overline{X}-\mu}{S\left(\overline{X}\right)}\sim\mathcal{N}\left(0, 1\right)\)

  • \(q_{1-\alpha/2}=q_{1-\alpha/2}\left(\mathcal{N}(0, 1)\right)\)

Unknown variance and small sample size

  • \(S\left(\overline{X}\right)=\sqrt{\dfrac{nS^2}{n-1}}\)

  • \(\dfrac{\overline{X}-\mu}{S\left(\overline{X}\right)}\sim\mathcal{T}_{n-1}\)

  • \(q_{1-\alpha/2}=q_{1-\alpha/2}\left(\mathcal{T}_{n-1}\right)\)

Unknown variance and large sample size

  • \(S\left(\overline{X}\right)=\sqrt{\dfrac{nS^2}{n-1}}\)

  • \(\dfrac{\overline{X}-\mu}{S\left(\overline{X}\right)}\rightarrow\mathcal{N}\left(0, 1\right)\)

  • \(q_{1-\alpha/2}=q_{1-\alpha/2}\left(\mathcal{N}\left(0, 1\right)\right)\)

Large non-gaussian sample

  • Typically, a sample is considered to be large when \(n\geq 30\)

Known Variance

  • \(S\left(\overline{X}\right)=\sqrt{\dfrac{\sigma^2}{n}}\)

  • \(\dfrac{\overline{X}-\mu}{S\left(\overline{X}\right)}\rightarrow\mathcal{N}\left(0, 1\right)\)

  • \(q_{1-\alpha/2}=q_{1-\alpha/2}\left(\mathcal{N}(0, 1)\right)\)

Unknown Variance

  • \(S\left(\overline{X}\right)=\sqrt{\dfrac{S^2}{n}}\)

  • \(\dfrac{\overline{X}-\mu}{S\left(\overline{X}\right)}\rightarrow\mathcal{N}\left(0, 1\right)\)

  • \(q_{1-\alpha/2}=q_{1-\alpha/2}\left(\mathcal{N}(0, 1)\right)\)