5  Pearson correlation confidence interval

5.1 What is-it?

  • Pearson’s correlation coefficient is a widely used statistical measure that quantifies the strength and direction of the linear relationship between two continuous variables. It ranges from -1 to 1, where -1 represents a perfect negative linear relationship, 1 represents a perfect positive linear relationship, and 0 represents no linear relationship.

  • When working with correlation coefficients, it is often useful to estimate the confidence interval around the sample correlation coefficient. The confidence interval provides a range of values within which we can be reasonably confident that the true population correlation lies.

  • To construct the confidence interval for Pearson’s correlation coefficient, a common approach is to use the Fisher \(z\) transformation. The Fisher \(z\) transformation converts the correlation coefficient into a normally distributed variable that allows for the application of standard statistical techniques.

5.2 Data and statistics

  • \((x_i, y_i)\in\mathbb{R}^2\), \(i=1,\cdots, n\)

  • The observations are supposed to be iid realizations of random vectors \(\left(X_i, Y_i\right)\), \(i=1,\cdots,n\) \(\overset{iid}{\sim}\mathcal{N}_2\left(\mu, \Lambda\right)\)

  • Sample correlation: \(R_{(X,Y)}=\dfrac{\sum_{i-1}^n\left(X_i-\overline{X}\right)\left(Y_i-\overline{Y}\right)}{\sqrt{\sum_{i-1}^n\left(X_i-\overline{X}\right)^2}\sqrt{\sum_{i-1}^n\left(Y_i-\overline{Y}\right)^2}}\)

5.3 Fisher \(Z\) transformation

  • Let \(f\) be the function defined on \(]0, 1[\) by \(f(x)=\dfrac{1}{2}\ln\dfrac{1+x}{1-x}\)

  • \(f\) defines an increasing bijection, and its inverse bijection is given by \(f^{-1}(z)=\tanh(z)=\dfrac{e^{2z}-1}{e^{2z}+1}\)

  • \(Z_{(X, Y)}=\sqrt{n-3}\left(f\left(R_{(X, Y)}\right)-f(\rho)\right)\approx\mathcal{N}\left(0, 1\right)\)

5.4 Confidence Interval (CI)

  • Let \(B_l=f\left(R_{(X,Y)}\right)-\dfrac{q_{1-\alpha/2}}{\sqrt{n-3}}\)

  • Let \(B_r=f\left(R_{(X,Y)}\right)+\dfrac{q_{1-\alpha/2}}{\sqrt{n-3}}\)

  • \(IC_{1-\alpha}\left(f(\rho)\right) = \left[B_l, B_r\right]\)

  • \(IC_{1-\alpha}\left(\rho\right)=\left[\tanh\left(B_l\right), \tanh\left(B_l\right)\right]\)

5.5 Needed statistics

  • \(\alpha\)

  • Sample size: \(n\)

  • \(\sum_iX_i\) and \(\sum_iX_i^2\)

  • \(\sum_iY_i\) and \(\sum_iY_i^2\)

  • \(\sum_iX_iY_i\)