import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
# Load the Iris dataset
= load_iris()
iris = iris.data
X
# Center the data (subtract the mean)
= StandardScaler()
scaler = scaler.fit_transform(X)
X_scaled
# Perform SVD
= np.linalg.svd(X_scaled)
U, Sigma, VT
# Keep only the top 2 singular values and vectors
= U[:, :2] # First 2 columns of U
U_k = np.diag(Sigma[:2]) # First 2 singular values
Sigma_k = VT[:2, :] # First 2 rows of V^T
V_k
# Project the data into the 2D space
= U_k @ Sigma_k
X_reduced
# Plot the reduced data
0], X_reduced[:, 1], c=iris.target, cmap='viridis')
plt.scatter(X_reduced[:, 'Component 1')
plt.xlabel('Component 2')
plt.ylabel('Iris Dataset - Dimensionality Reduction with SVD')
plt.title( plt.show()
2 SVD and Dimensionality Reduction
One of the most powerful applications of Singular Value Decomposition (SVD) is in dimensionality reduction. In this chapter, we will explore how SVD can be used to reduce the number of dimensions in a dataset while preserving as much important information as possible. This process helps improve the efficiency of machine learning algorithms and can also be used for data visualization and noise reduction.
2.1 Dimensionality Reduction Basics
2.1.1 Curse of Dimensionality
In data analysis and machine learning, we often deal with high-dimensional datasets, where each data point has many features (dimensions). However, high-dimensional data poses several challenges:
- Increased computational cost: Processing and storing large datasets is expensive in terms of time and resources.
- Overfitting: In machine learning, models may overfit when trained on high-dimensional data because the model can learn noise instead of meaningful patterns.
- Interpretability: Analyzing and visualizing data becomes difficult when the number of dimensions increases.
The curse of dimensionality refers to these challenges. To address this, we can use techniques like SVD to reduce the number of dimensions while retaining the most important information.
2.1.2 Why Use SVD for Dimensionality Reduction?
SVD is particularly useful for dimensionality reduction because it provides a low-rank approximation of a matrix, allowing us to focus on the most important singular values and corresponding singular vectors. This enables us to reduce the complexity of the data without losing much information.
The goal of dimensionality reduction with SVD is to find a low-dimensional representation of the original dataset that captures most of its variability.
2.2 Using SVD for Dimensionality Reduction
Given a data matrix \(A \in \mathbb{R}^{n \times p}\), where \(n\) is the number of data points (rows) and \(p\) is the number of features (columns), the SVD of \(A\) is:
\[ A = U \Sigma V^T \]
To reduce the dimensionality, we can approximate \(A\) using only the top \(k\) singular values and corresponding singular vectors. This yields a rank-\(k\) approximation:
\[ A_k = U_k \Sigma_k V_k^T \]
Where:
- \(U_k \in \mathbb{R}^{n \times k}\) contains the first \(k\) left singular vectors (columns of \(U\)).
- \(\Sigma_k \in \mathbb{R}^{k \times k}\) contains the first \(k\) singular values (the top-left \(k \times k\) block of \(\Sigma\)).
- \(V_k^T \in \mathbb{R}^{k \times p}\) contains the first \(k\) right singular vectors (rows of \(V^T\)).
By choosing a smaller \(k\), we effectively reduce the number of dimensions from \(p\) to \(k\).
2.2.1 Low-Rank Approximation
The low-rank approximation of \(A\), denoted \(A_k\), is the closest matrix of rank \(k\) to \(A\) in terms of minimizing the Frobenius norm:
\[ \| A - A_k \|_F \]
Where the Frobenius norm is the sum of squared differences between the elements of \(A\) and \(A_k\). This low-rank approximation is optimal in the sense that it preserves as much of the data’s structure as possible while reducing its dimensionality.
2.2.2 Selection of \(k\)
Choosing the appropriate value of \(k\) is critical in dimensionality reduction. Several methods are commonly used to decide how many singular values (and corresponding singular vectors) to keep:
Scree Plot: A plot of the singular values, where the goal is to find the “elbow” of the curve—the point where the singular values begin to level off. This indicates that adding more dimensions (i.e., increasing \(k\)) will not significantly improve the approximation.
Variance Explained: Another approach is to select \(k\) such that the cumulative variance explained by the first \(k\) singular values is above a certain threshold (e.g., 90% or 95%). The variance explained by a singular value \(\sigma_i\) is proportional to \(\sigma_i^2\), and the cumulative variance is the sum of variances for the top \(k\) singular values.
\[ \text{Explained Variance} = \frac{\sum_{i=1}^{k} \sigma_i^2}{\sum_{i=1}^{\min(n, p)} \sigma_i^2} \]
This method ensures that the reduced data retains most of the variability in the original dataset.
2.3 Comparison with PCA
Principal Component Analysis (PCA) is another common technique for dimensionality reduction, and it is closely related to SVD. In fact, PCA can be viewed as a special case of SVD.
2.3.1 PCA and SVD
PCA computes the principal components of a dataset, which are the directions of maximum variance in the data. The principal components are the eigenvectors of the covariance matrix of the data, and the amount of variance captured by each principal component is proportional to the corresponding eigenvalue.
Interestingly, the principal components can also be obtained using SVD. If we perform SVD on the centered data matrix (i.e., subtracting the mean of each column from the data), the right singular vectors \(V\) correspond to the principal components of the data, and the singular values are related to the eigenvalues of the covariance matrix.
Thus, SVD and PCA are mathematically related, and both can be used for dimensionality reduction. The difference lies in how they are applied:
- PCA works directly on the covariance matrix, and the principal components are the directions of maximum variance.
- SVD works directly on the data matrix, and the right singular vectors correspond to the principal components when the data is centered.
2.4 Practical Example: Dimensionality Reduction on a Real Dataset
To see SVD in action, let’s consider a simple example of using SVD for dimensionality reduction on a real-world dataset.
2.4.1 Example: Reducing the Dimensionality of the Iris Dataset
The Iris dataset contains 150 samples of iris flowers, with 4 features (sepal length, sepal width, petal length, petal width). Our goal is to reduce the dimensionality of the dataset from 4 features to 2 features using SVD.
2.4.1.1 Steps:
- Load the Iris dataset.
- Center the data by subtracting the mean of each feature.
- Perform SVD on the centered data matrix.
- Keep only the top 2 singular values and the corresponding singular vectors.
- Project the data into the 2-dimensional space defined by the first 2 right singular vectors.
Below is a Python code snippet to perform dimensionality reduction using SVD: