2 Mathematical Foundations of PCA and SVD

2.1 PCA and the Covariance Matrix

In traditional PCA, the covariance matrix is central to the computation of the principal components. The covariance matrix measures how much the features of a dataset vary together. For a centered data matrix \(X \in \mathbb{R}^{n \times p}\), where \(n\) is the number of samples and \(p\) is the number of features, the covariance matrix is given by:

\[ S^2 = \dfrac{1}{n} X^T X \]

However, in certain applications, we may need to incorporate weights or a metric to account for specific characteristics of the data. In such cases, we use a weighted PCA framework with a weights matrix \(D\) and a metric matrix \(M\), both of which are positive definite matrices.

2.2 Weighted PCA with Matrices \(D\) and \(M\)

Let \(D \in \mathbb{R}^{n \times n}\) represent the weights matrix, which assigns weights to the rows of the data matrix (i.e., the samples). This matrix is typically symmetric and positive definite.

Similarly, let \(M \in \mathbb{R}^{p \times p}\) represent the metric matrix, which defines the geometry of the feature space (columns). The metric \(M\) is also symmetric and positive definite.

The covariance matrix in the presence of weights and a metric is modified as follows:

\[ S^2 = X^T D X M \]

Where:

\(D \in \mathbb{R}^{n \times n}\) is the weights matrix, influencing how the rows (samples) are weighted in the computation.
\(M \in \mathbb{R}^{p \times p}\) is the metric matrix, defining the inner product in the feature space.

2.2.1 Properties of \(D\) and \(M\)

Since both \(D\) and \(M\) are positive definite, they preserve important properties such as:

Symmetry: \(D = D^T\) and \(M = M^T\).
Positive definiteness: All eigenvalues of \(D\) and \(M\) are positive, which ensures stability in the computation of the principal components.

2.3 SVD for Weighted PCA

To compute the principal components using Singular Value Decomposition (SVD) in the context of weighted PCA, we decompose the weighted and metric-adjusted data matrix. Given the weighted covariance matrix:

\[ S^2 = X^T D X M \]

We apply SVD to the matrix \(X\) (adjusted by \(D\) and \(M\)):

\[ X = U \Sigma V^T \]

Where:

\(U \in \mathbb{R}^{n \times n}\) contains the left singular vectors.
\(\Sigma \in \mathbb{R}^{n \times p}\) is the diagonal matrix of singular values.
\(V^T \in \mathbb{R}^{p \times p}\) contains the right singular vectors.

The matrix \(V\) still contains the directions of maximum variance (the principal components), but these components are now adjusted based on the weights matrix \(D\) and the metric matrix \(M\).

2.4 Dimensionality Reduction and Principal Components

After performing SVD, we can reduce the dimensionality of the data by retaining only the top \(k\) singular values and their corresponding vectors. This allows us to approximate the original data matrix \(X\) using a lower-dimensional representation, while still capturing most of the variance in the data.

The principal components are the columns of \(V\), and they define the new coordinate system for the data, taking into account the weighting and metric adjustments provided by \(D\) and \(M\).

In the next chapter, we will implement SVD-based PCA in Python, incorporating these weights and metric matrices, and explore how to interpret the resulting principal components in practice.