1  Mathematical Foundation of SVD

In this chapter, we will explore the mathematical details behind Singular Value Decomposition (SVD). We’ll break down the decomposition process, explain the meaning of singular values and vectors, and discuss the important properties of SVD that make it a widely used tool in data analysis.

1.1 Definition of SVD

Definition 1.1 (SVD) Singular Value Decomposition (SVD) is a method of decomposing a given matrix \(A \in \mathbb{R}^{n \times p}\) (where \(n\) is the number of rows and \(p\) is the number of columns) into three other matrices. Specifically, SVD is defined as:

\[ A = U \Sigma V^T \]

Where:

  • \(A \in \mathbb{R}^{n \times p}\) is the original matrix.
  • \(U \in \mathbb{R}^{n \times n}\) is an orthogonal matrix whose columns are the left singular vectors of \(A\).
  • \(\Sigma \in \mathbb{R}^{n \times p}\) is a diagonal matrix containing the singular values of \(A\), arranged in decreasing order.
  • \(V \in \mathbb{R}^{p \times p}\) is an orthogonal matrix whose columns are the right singular vectors of \(A\).
  • \(V^T \in \mathbb{R}^{p \times p}\) denotes the transpose of \(V\).

The matrices \(U\), \(\Sigma\), and \(V\) reveal important geometric properties of the matrix \(A\) and help in analyzing its structure.

Note

The matrix \(\Sigma\) is not necessarily square since \(n\) and \(p\) can be different, but it will always contain non-negative singular values along its diagonal.

Definition 1.2 (Singular Values and Vectors)  

  • The singular values of \(A\), which are the diagonal elements of \(\Sigma\), are non-negative real numbers. These values represent the magnitude of the axes along which the matrix \(A\) stretches or compresses the data.

  • The left singular vectors (columns of \(U\)) represent the directions in which the rows of the matrix \(A\) are stretched.

  • The right singular vectors (columns of \(V\)) represent the directions in which the columns of the matrix \(A\) are stretched.

The singular values are typically arranged in decreasing order, and the first few singular values often capture most of the “energy” or essential information in the matrix. This property is what makes SVD useful for dimensionality reduction.

1.2 Properties of SVD

SVD has several important properties that make it useful in data analysis:

1.2.1 Orthogonality

The matrices \(U\) and \(V\) are orthogonal matrices, meaning:

\[ U^T U = I_n \quad \text{and} \quad V^T V = I_p \]

Where \(I_n\) and \(I_p\) are identity matrices of size \(n \times n\) and \(p \times p\), respectively. Orthogonality implies that the columns of \(U\) and \(V\) form orthonormal bases, making them useful for representing directions in the data.

1.2.2 Singular Values and Matrix Rank

The singular values in \(\Sigma\) provide information about the rank of the matrix \(A\). Specifically, the number of non-zero singular values is equal to the rank of the matrix:

\[ \text{rank}(A) = \#\{\sigma_i > 0\} \]

If a matrix has small singular values, we can approximate the matrix with fewer singular values, which leads to the concept of low-rank approximation.

1.2.3 Optimal Low-Rank Approximation

One of the most important applications of SVD is the low-rank approximation of a matrix. If we keep only the top \(k\) largest singular values and their corresponding singular vectors, we can approximate \(A\) by a matrix \(A_k\) of rank \(k\), where:

\[ A_k = U_k \Sigma_k V_k^T \]

This approximation minimizes the Frobenius norm (sum of squared differences) between \(A\) and \(A_k\), making it the optimal rank-\(k\) approximation. This is especially useful for dimensionality reduction, data compression, and denoising.

1.2.4 Eigenvalues and SVD

For square matrices, the singular values of \(A\) are related to the eigenvalues of the matrix. Specifically, the singular values of \(A\) are the square roots of the non-negative eigenvalues of \(A^T A\) (or equivalently, \(A A^T\)).

This relationship between eigenvalues and singular values explains why SVD is closely related to techniques like Principal Component Analysis (PCA), which also relies on eigenvalue decomposition to reduce dimensionality.

1.3 Geometric Interpretation

Singular Value Decomposition provides a geometric view of how a matrix transforms data. Specifically, it can be understood as a three-step process:

  1. Rotation by \(V^T\): The matrix \(V^T\) rotates the data in the input space to align with the axes corresponding to the right singular vectors of \(A\).

  2. Scaling by \(\Sigma\): The diagonal matrix \(\Sigma\) stretches or compresses the data along these new axes. The magnitude of the scaling is determined by the singular values in \(\Sigma\).

  3. Rotation by \(U\): Finally, the matrix \(U\) rotates the data again, aligning it with the axes corresponding to the left singular vectors of \(A\).

This geometric perspective is crucial in understanding how SVD can be used to decompose and manipulate data in tasks like dimensionality reduction, where we discard small singular values (and their corresponding vectors) to approximate the original data with fewer dimensions.

1.4 Computation of SVD

1.4.1 Algorithmic Complexity

SVD can be computationally expensive for large matrices. The most common method for computing the SVD is Jacobi’s method or other iterative algorithms. For a matrix of size \(n \times p\), the computational complexity of SVD is \(O(np^2)\), which can be time-consuming for large matrices.

In practice, truncated SVD is often used, where only the top \(k\) singular values and vectors are computed. This reduces the computational cost significantly.

1.4.2 Numerical Stability

SVD is numerically stable and can be used even when the matrix \(A\) is close to singular or ill-conditioned. The decomposition captures the most important information in the data while filtering out small, noisy components.

1.5 Theoretical Example: SVD of a Non-Square Matrix

Let’s compute the Singular Value Decomposition of a simple \(3 \times 2\) matrix.

Let \(A \in \mathbb{R}^{3 \times 2}\) be:

\[ A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{pmatrix} \]

We want to compute its SVD, i.e., find \(U\), \(\Sigma\), and \(V^T\) such that:

\[ A = U \Sigma V^T \]

For this matrix, the SVD decomposition is:

\[ U = \begin{pmatrix} -\frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} & 0 & -\frac{1}{\sqrt{2}} \\ 0 & 1 & 0 \end{pmatrix}, \quad \Sigma = \begin{pmatrix} \sqrt{2} & 0 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}, \quad V^T = \begin{pmatrix} -\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{pmatrix} \]

  • \(U \in \mathbb{R}^{3 \times 3}\) is an orthogonal matrix with the left singular vectors.
  • \(\Sigma \in \mathbb{R}^{3 \times 2}\) is a diagonal matrix with singular values.
  • \(V^T \in \mathbb{R}^{2 \times 2}\) is an orthogonal matrix with the right singular vectors.

Thus, we have decomposed \(A\) into the product of three matrices: \(U\), \(\Sigma\), and \(V^T\).

Note

You can verify this SVD decomposition in Python using numpy.linalg.svd().

1.6 Summary of the Mathematical Foundation of SVD

  • SVD decomposes a matrix \(A \in \mathbb{R}^{n \times p}\) into three components: \(U\), \(\Sigma\), and \(V^T\).
  • Singular values capture the magnitude of the axes along which the data is stretched or compressed.
  • SVD allows us to compute the optimal low-rank approximation of a matrix, making it a powerful tool for dimensionality reduction.
  • The matrices \(U\) and \(V\) are orthogonal, providing meaningful directions in the data.
  • The geometric interpretation of SVD reveals how the matrix \(A\) transforms the data via rotations and scaling.

In the next chapter, we’ll explore how this mathematical foundation of SVD can be applied to dimensionality reduction and other practical tasks in data analysis.