3 SVD to solve PCA

3.1 From PCA to SVD

Consider a PCA with:

Data matrix \(X \in \mathbb{R}^{n\times p}\)
Metric \(M\in\mathbb{R}^{p\times p}\) (usually symmetric positive definite)
Weights \(D\in\mathbb{R}^{n\times n}\) (diagonal matrix with weights \(d_i\))

The key transformations are: \[ \begin{aligned} X_c &= X-1_n\bar{x}^T\\ Y &= D^{1/2}X_cM^{1/2}\\ \end{aligned} \]

Then the SVD of \(Z\) gives us all PCA elements:

\[Y = U\Sigma V^T\]

Let’s implement this transformation:

Code

import numpy as np
from scipy import linalg

# Example with standardized data
def standardize(X):
    """Standardize data matrix"""
    mean = X.mean(axis=0)
    std = X.std(axis=0)
    return (X - mean) / std
  
def centering(X):
  """Centering data matrix"""
  mean = X.mean(axis=0)
  return (X - mean)

def transform_for_svd(X, M, D):
    """
    Transform data matrix for SVD computation
    
    Parameters:
    -----------
    X : array-like, shape (n_samples, n_features)
        Data matrix
    M : array-like, shape (n_features, n_features)
        Metric matrix
    D : array-like, shape (n_samples, n_samples)
        Weight matrix (diagonal)
        
    Returns:
    --------
    Y : array-like
        Transformed matrix for SVD
    """
    # Compute matrix square roots
    M_sqrt = linalg.sqrtm(M)
    D_sqrt = np.sqrt(D)
    
    # Centering
    Xc = centering(X)
    
    # Transform X
    Y = np.diag(D_sqrt) @ Xc @ M_sqrt
    
    return Y

# Example usage
n, p = 100, 5
X = np.random.randn(n, p)
M = np.eye(p)  # Identity metric
D = np.full(n, 1/n)  # Uniform weights

Y = transform_for_svd(X, M, D)

3.2 Recovering PCA Elements

From the SVD of \(Y\), we can recover all PCA elements:

Principal axes: \(A = M^{-1/2}V\)
Eigenvalues: \(\lambda_i = \sigma_i^2\)
Principal components: \(C = D^{-1/2}U\Sigma\)
Principal factors: \(F = D^{-1/2}U\)

Code

def compute_pca_from_svd(X, M, D):
    """
    Compute PCA elements using SVD
    """
    # Transform data
    Y = transform_for_svd(X, M, D)
    
    # Compute SVD
    U, s, Vt = np.linalg.svd(Y, full_matrices=False)
    
    # Recover PCA elements
    M_sqrt = linalg.sqrtm(M)
    M_inv_sqrt = linalg.inv(M_sqrt)
    D_sqrt = np.sqrt(D)
    
    # Principal components
    C = np.diag(1/D_sqrt) @ U @ np.diag(s)
    
    # Principal factor
    F = np.diag(1/D_sqrt) @ U
    
    # Principal axes
    A = M_inv_sqrt @ Vt.T
    
    # Eigenvalues
    eigenvals = s**2
    
    return C, F, A, eigenvals



X_std = standardize(X)
C, F, A, eigenvals = compute_pca_from_svd(X_std, M, D)

3.3 Properties and Verification

Let’s verify the key properties:

M-orthonormality of axes
D-orthogonality of components
Variance decomposition

Code

def verify_pca_properties(C, A, M, D):
    """Verify PCA properties"""
    # M-orthonormality of axes
    axes_orthonormality = A.T @ M @ A
    
    # D-orthogonality of components
    components_orthogonality = C.T @ np.diag(D) @ C
    
    return axes_orthonormality, components_orthogonality

# Verify properties
axes_ortho, comp_ortho = verify_pca_properties(C, A, M, D)

import matplotlib.pyplot as plt
import seaborn as sns

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

sns.heatmap(axes_ortho, annot=True, ax=ax1)
ax1.set_title('M-orthonormality of axes')

sns.heatmap(comp_ortho, annot=True, ax=ax2)
ax2.set_title('D-orthogonality of components')

plt.tight_layout()
plt.show()

3.4 Special Cases

3.4.1 Standard PCA

When \(M = I_p\) and \(D = \dfrac{1}{n}I_n\):

Code

def standard_pca(X):
    """Standard PCA with identity metric and uniform weights"""
    n = X.shape[0]
    M = np.eye(X.shape[1])
    D = np.full(n, 1/n)
    
    return compute_pca_from_svd(X, M, D)

# Compare with standard PCA implementation
from sklearn.decomposition import PCA
pca_sklearn = PCA()
pca_sklearn.fit(X_std)

print("Eigenvalues from our implementation:", eigenvals)
print("Eigenvalues from sklearn:", pca_sklearn.explained_variance_)

Eigenvalues from our implementation: [1.37874308 1.15645883 0.89165175 0.83267751 0.74046884]
Eigenvalues from sklearn: [1.39266978 1.16814023 0.90065833 0.84108839 0.74794832]

3.4.2 Normalized PCA

When \(M = \text{diag}(1/s_j^2)\) where \(s_j\) are variable standard deviations:

Code

def normalized_pca(X):
    """PCA with variables normalized to unit variance"""
    std = X.std(axis=0)
    M = np.diag(1/std**2)
    n = X.shape[0]
    D = np.full(n, 1/n)
    
    return compute_pca_from_svd(X, M, D)

3.5 Visualization Tools

Code

def plot_pca_results(F, eigenvals):
    """Plot PCA results"""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
    
    # Scree plot
    ax1.plot(range(1, len(eigenvals)+1), eigenvals, 'bo-')
    ax1.set_title('Scree Plot')
    ax1.set_xlabel('Component')
    ax1.set_ylabel('Eigenvalue')
    
    # First plane
    ax2.scatter(F[:, 0], F[:, 1])
    ax2.set_title('First Principal Plane')
    ax2.set_xlabel('PC1')
    ax2.set_ylabel('PC2')
    
    plt.tight_layout()
    plt.show()

plot_pca_results(F, eigenvals)

Key Points

The SVD approach provides a stable numerical solution for PCA
The transformation \(Y = D^{1/2}XM^{1/2}\) is crucial
All PCA elements can be recovered from SVD components
Properties are preserved through the transformation

3.6 Mathematical Notes

The equivalence between PCA and SVD comes from:

The eigendecomposition of \(XMX^TD\): \[XMX^TD = U\Lambda U^T\]
The relationship with SVD: \[D^{1/2}XM^{1/2} = U\Sigma V^T\]
The equivalence of optimization problems:
- PCA: \(\max_{a^TMa=1} a^TX^TDXa\)
- SVD: \(\max_{v^Tv=1} \|D^{1/2}XM^{1/2}v\|^2\)

3.1 From PCA to SVD

3.2 Recovering PCA Elements

3.3 Properties and Verification

3.4 Special Cases

3.4.1 Standard PCA

3.4.2 Normalized PCA

3.5 Visualization Tools

3.6 Mathematical Notes

3.7 References