5 Pearson correlation confidence interval
5.1 What is-it?
Pearson’s correlation coefficient is a widely used statistical measure that quantifies the strength and direction of the linear relationship between two continuous variables. It ranges from -1 to 1, where -1 represents a perfect negative linear relationship, 1 represents a perfect positive linear relationship, and 0 represents no linear relationship.
When working with correlation coefficients, it is often useful to estimate the confidence interval around the sample correlation coefficient. The confidence interval provides a range of values within which we can be reasonably confident that the true population correlation lies.
To construct the confidence interval for Pearson’s correlation coefficient, a common approach is to use the Fisher \(z\) transformation. The Fisher \(z\) transformation converts the correlation coefficient into a normally distributed variable that allows for the application of standard statistical techniques.
5.2 Data and statistics
\((x_i, y_i)\in\mathbb{R}^2\), \(i=1,\cdots, n\)
The observations are supposed to be iid realizations of random vectors \(\left(X_i, Y_i\right)\), \(i=1,\cdots,n\) \(\overset{iid}{\sim}\mathcal{N}_2\left(\mu, \Lambda\right)\)
Sample correlation: \(R_{(X,Y)}=\dfrac{\sum_{i-1}^n\left(X_i-\overline{X}\right)\left(Y_i-\overline{Y}\right)}{\sqrt{\sum_{i-1}^n\left(X_i-\overline{X}\right)^2}\sqrt{\sum_{i-1}^n\left(Y_i-\overline{Y}\right)^2}}\)
5.3 Fisher \(Z\) transformation
Let \(f\) be the function defined on \(]0, 1[\) by \(f(x)=\dfrac{1}{2}\ln\dfrac{1+x}{1-x}\)
\(f\) defines an increasing bijection, and its inverse bijection is given by \(f^{-1}(z)=\tanh(z)=\dfrac{e^{2z}-1}{e^{2z}+1}\)
\(Z_{(X, Y)}=\sqrt{n-3}\left(f\left(R_{(X, Y)}\right)-f(\rho)\right)\approx\mathcal{N}\left(0, 1\right)\)
5.4 Confidence Interval (CI)
Let \(B_l=f\left(R_{(X,Y)}\right)-\dfrac{q_{1-\alpha/2}}{\sqrt{n-3}}\)
Let \(B_r=f\left(R_{(X,Y)}\right)+\dfrac{q_{1-\alpha/2}}{\sqrt{n-3}}\)
\(IC_{1-\alpha}\left(f(\rho)\right) = \left[B_l, B_r\right]\)
\(IC_{1-\alpha}\left(\rho\right)=\left[\tanh\left(B_l\right), \tanh\left(B_l\right)\right]\)
5.5 Needed statistics
\(\alpha\)
Sample size: \(n\)
\(\sum_iX_i\) and \(\sum_iX_i^2\)
\(\sum_iY_i\) and \(\sum_iY_i^2\)
\(\sum_iX_iY_i\)