2 Mathematical Foundations of SVD

2.1 Definition of SVD

Singular Value Decomposition (SVD) is a matrix factorization method for any real or complex matrix \(A \in \mathbb{R}^{n \times p}\). The SVD of a matrix \(A\) is defined as:

\[ A = U \Sigma V^T \]

where:

\(U \in \mathbb{R}^{n \times n}\) is an orthogonal matrix containing the left singular vectors of \(A\).
\(\Sigma \in \mathbb{R}^{n \times p}\) is a diagonal matrix containing the singular values \(\sigma_i\) on the diagonal.
\(V \in \mathbb{R}^{p \times p}\) is an orthogonal matrix containing the right singular vectors of \(A\).

The entries of \(\Sigma\) (i.e., the singular values) are always non-negative and are arranged in descending order:

\[ \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_{\min(n,p)} \geq 0 \]

2.2 Properties of SVD

2.2.1 Orthogonality of \(U\) and \(V\)

Both \(U\) and \(V\) are orthogonal matrices, meaning that:

\[ U^T U = I_n \quad \text{and} \quad V^T V = I_p \]

This implies that \(U^{-1} = U^T\) and \(V^{-1} = V^T\).

2.2.2 Singular Values and Rank

The singular values \(\sigma_i\) provide important information about the matrix \(A\):

The number of non-zero singular values equals the rank of the matrix \(A\).
If all singular values are zero, \(A\) is the zero matrix.

The matrix \(\Sigma\) is a diagonal matrix, and it represents how the original matrix \(A\) scales along the directions defined by the singular vectors in \(U\) and \(V\).

2.2.3 Eigenvalues and SVD

For square matrices (\(A \in \mathbb{R}^{n \times n}\)), SVD is related to eigenvalue decomposition. If \(A\) is symmetric, the singular values of \(A\) are the absolute values of its eigenvalues, and the left and right singular vectors correspond to the eigenvectors of \(A\). In this case, the SVD becomes the eigenvalue decomposition:

\[ A = Q \Lambda Q^T \]

where:

\(Q\) is the orthogonal matrix of eigenvectors.
\(\Lambda\) is the diagonal matrix of eigenvalues.

For general matrices, SVD generalizes this decomposition to non-square matrices, with the singular values playing a similar role to eigenvalues.

2.3 Best Low-Rank Approximation

One of the key results in matrix theory is that SVD provides the best low-rank approximation of a matrix. If we want to approximate a matrix \(A\) by a matrix of rank \(k\), the best such approximation in the Frobenius norm sense is obtained by truncating the SVD:

\[ A_k = U_k \Sigma_k V_k^T \]

where:

\(U_k\) consists of the first \(k\) columns of \(U\),
\(\Sigma_k\) is a diagonal matrix with the first \(k\) singular values,
\(V_k\) consists of the first \(k\) columns of \(V\).

This truncated SVD minimizes the Frobenius norm \(\| A - A_k \|_F\) over all rank-\(k\) matrices.

2.3.1 Proof of the Best Approximation Theorem

For a matrix \(A \in \mathbb{R}^{n \times p}\), we can decompose it as \(A = U \Sigma V^T\). The truncated approximation \(A_k = U_k \Sigma_k V_k^T\) is the matrix of rank \(k\) that minimizes:

\[ \| A - A_k \|_F \]

The proof is based on the fact that the singular values in \(\Sigma\) represent the magnitude of variance captured by the corresponding singular vectors. By retaining the largest singular values, \(A_k\) captures the maximum possible variance in the rank-\(k\) approximation.

2.4 Matrix Norms and SVD

2.4.1 Frobenius Norm

The Frobenius norm of a matrix \(A\) is given by:

\[ \| A \|_F = \sqrt{\sum_{i,j} |A_{ij}|^2} \]

In terms of the SVD, the Frobenius norm is related to the singular values by:

\[ \| A \|_F = \sqrt{\sum_{i=1}^{r} \sigma_i^2} \]

where \(r = \text{rank}(A)\) is the number of non-zero singular values.

2.4.2 Spectral Norm

The spectral norm (or operator 2-norm) of a matrix \(A\) is the largest singular value:

\[ \| A \|_2 = \sigma_1 \]

This norm measures the maximum stretching factor of the matrix \(A\) when applied to a vector.

2.5 Uniqueness of SVD

The singular values of a matrix \(A\) are unique, meaning the values in \(\Sigma\) are uniquely determined. However, the singular vectors (columns of \(U\) and \(V\)) are not necessarily unique if some singular values are repeated. Specifically:

If \(\sigma_i \neq \sigma_j\) for all \(i \neq j\), the corresponding singular vectors are unique up to a sign.
If some singular values are equal, the corresponding singular vectors are not unique and can be any orthogonal basis for the corresponding subspace.

import numpy as np

# Create a random matrix
A = np.random.rand(4, 3)

# Perform SVD
U, S, VT = np.linalg.svd(A)

print("Matrix U:", U)

Matrix U: [[-0.40753611  0.35346171  0.61685337  0.57312394]
 [-0.5155134   0.24389049  0.24742676 -0.78329009]
 [-0.55332551  0.30030701 -0.74439729  0.22252922]
 [-0.5118507  -0.85170297  0.0643785   0.09201265]]

print("Singular Values:", S)

Singular Values: [2.02828422 0.73931806 0.46688778]

print("Matrix V^T:", VT)

Matrix V^T: [[-0.63626537 -0.58872204 -0.4985707 ]
 [ 0.76939064 -0.43681581 -0.46607938]
 [ 0.05660764 -0.6801458   0.730888  ]]

In this example: - U contains the left singular vectors. - S contains the singular values. - VT contains the right singular vectors.

# Create a random matrix
A <- matrix(rnorm(12), nrow=4, ncol=3)

# Perform SVD
svd_result <- svd(A)

# Extract U, S, and V
U <- svd_result$u
S <- diag(svd_result$d)
V <- svd_result$v

print("Matrix U:")

[1] "Matrix U:"

print(U)

           [,1]       [,2]        [,3]
[1,] -0.3584208 -0.1464620  0.86113368
[2,] -0.3850407 -0.7563673 -0.08951404
[3,] -0.3476941 -0.2847331 -0.48110727
[4,] -0.7761359  0.5704248 -0.13773820

print("Singular Values:")

[1] "Singular Values:"

print(S)

        [,1]     [,2]    [,3]
[1,] 3.84848 0.000000 0.00000
[2,] 0.00000 1.137241 0.00000
[3,] 0.00000 0.000000 0.78974

print("Matrix V:")

[1] "Matrix V:"

print(V)

           [,1]       [,2]       [,3]
[1,]  0.4573320 -0.6852765  0.5667836
[2,] -0.5892177 -0.7108659 -0.3840471
[3,] -0.6660855  0.1583219  0.7288787

In R, the svd() function decomposes a matrix into \(U\), \(\Lambda^{\frac{1}{2}}\), and \(V\), just like in Python.

2.7 Conclusion

The SVD is a versatile tool in linear algebra, with deep mathematical properties and a wide range of applications. Its ability to decompose matrices into orthogonal components allows for efficient computation, data compression, and insights into the structure of data. In the following chapters, we will explore practical applications of SVD, including dimensionality reduction, principal component analysis (PCA), and more.