Singular Value Decomposition (SVD) is a matrix factorization method for any real or complex matrix \(A \in \mathbb{R}^{n \times p}\). The SVD of a matrix \(A\) is defined as:
\[
A = U \Sigma V^T
\]
where:
\(U \in \mathbb{R}^{n \times n}\) is an orthogonal matrix containing the left singular vectors of \(A\).
\(\Sigma \in \mathbb{R}^{n \times p}\) is a diagonal matrix containing the singular values\(\sigma_i\) on the diagonal.
\(V \in \mathbb{R}^{p \times p}\) is an orthogonal matrix containing the right singular vectors of \(A\).
The entries of \(\Sigma\) (i.e., the singular values) are always non-negative and are arranged in descending order:
Both \(U\) and \(V\) are orthogonal matrices, meaning that:
\[
U^T U = I_n \quad \text{and} \quad V^T V = I_p
\]
This implies that \(U^{-1} = U^T\) and \(V^{-1} = V^T\).
2.2.2 Singular Values and Rank
The singular values \(\sigma_i\) provide important information about the matrix \(A\):
The number of non-zero singular values equals the rank of the matrix \(A\).
If all singular values are zero, \(A\) is the zero matrix.
The matrix \(\Sigma\) is a diagonal matrix, and it represents how the original matrix \(A\) scales along the directions defined by the singular vectors in \(U\) and \(V\).
2.2.3 Eigenvalues and SVD
For square matrices (\(A \in \mathbb{R}^{n \times n}\)), SVD is related to eigenvalue decomposition. If \(A\) is symmetric, the singular values of \(A\) are the absolute values of its eigenvalues, and the left and right singular vectors correspond to the eigenvectors of \(A\). In this case, the SVD becomes the eigenvalue decomposition:
\[
A = Q \Lambda Q^T
\]
where:
\(Q\) is the orthogonal matrix of eigenvectors.
\(\Lambda\) is the diagonal matrix of eigenvalues.
For general matrices, SVD generalizes this decomposition to non-square matrices, with the singular values playing a similar role to eigenvalues.
2.3 Best Low-Rank Approximation
One of the key results in matrix theory is that SVD provides the best low-rank approximation of a matrix. If we want to approximate a matrix \(A\) by a matrix of rank \(k\), the best such approximation in the Frobenius norm sense is obtained by truncating the SVD:
\[
A_k = U_k \Sigma_k V_k^T
\]
where:
\(U_k\) consists of the first \(k\) columns of \(U\),
\(\Sigma_k\) is a diagonal matrix with the first \(k\) singular values,
\(V_k\) consists of the first \(k\) columns of \(V\).
This truncated SVD minimizes the Frobenius norm \(\| A - A_k \|_F\) over all rank-\(k\) matrices.
2.3.1 Proof of the Best Approximation Theorem
For a matrix \(A \in \mathbb{R}^{n \times p}\), we can decompose it as \(A = U \Sigma V^T\). The truncated approximation \(A_k = U_k \Sigma_k V_k^T\) is the matrix of rank \(k\) that minimizes:
\[
\| A - A_k \|_F
\]
The proof is based on the fact that the singular values in \(\Sigma\) represent the magnitude of variance captured by the corresponding singular vectors. By retaining the largest singular values, \(A_k\) captures the maximum possible variance in the rank-\(k\) approximation.
2.4 Matrix Norms and SVD
2.4.1 Frobenius Norm
The Frobenius norm of a matrix \(A\) is given by:
\[
\| A \|_F = \sqrt{\sum_{i,j} |A_{ij}|^2}
\]
In terms of the SVD, the Frobenius norm is related to the singular values by:
\[
\| A \|_F = \sqrt{\sum_{i=1}^{r} \sigma_i^2}
\]
where \(r = \text{rank}(A)\) is the number of non-zero singular values.
2.4.2 Spectral Norm
The spectral norm (or operator 2-norm) of a matrix \(A\) is the largest singular value:
\[
\| A \|_2 = \sigma_1
\]
This norm measures the maximum stretching factor of the matrix \(A\) when applied to a vector.
2.5 Uniqueness of SVD
The singular values of a matrix \(A\) are unique, meaning the values in \(\Sigma\) are uniquely determined. However, the singular vectors (columns of \(U\) and \(V\)) are not necessarily unique if some singular values are repeated. Specifically:
If \(\sigma_i \neq \sigma_j\) for all \(i \neq j\), the corresponding singular vectors are unique up to a sign.
If some singular values are equal, the corresponding singular vectors are not unique and can be any orthogonal basis for the corresponding subspace.
In this example: - U contains the left singular vectors. - S contains the singular values. - VT contains the right singular vectors.
# Create a random matrixA <-matrix(rnorm(12), nrow=4, ncol=3)# Perform SVDsvd_result <-svd(A)# Extract U, S, and VU <- svd_result$uS <-diag(svd_result$d)V <- svd_result$vprint("Matrix U:")
In R, the svd() function decomposes a matrix into \(U\), \(\Lambda^{\frac{1}{2}}\), and \(V\), just like in Python.
2.7 Conclusion
The SVD is a versatile tool in linear algebra, with deep mathematical properties and a wide range of applications. Its ability to decompose matrices into orthogonal components allows for efficient computation, data compression, and insights into the structure of data. In the following chapters, we will explore practical applications of SVD, including dimensionality reduction, principal component analysis (PCA), and more.