4  Applications in Machine Learning

Singular Value Decomposition (SVD) is widely used in several areas of machine learning. Its ability to extract meaningful features from data, compress information, and provide optimal low-rank approximations makes it a powerful tool for handling large datasets. In this chapter, we will discuss the main applications of SVD in machine learning, including feature extraction, latent semantic analysis (LSA), and recommender systems.

4.1 Feature Extraction using SVD

4.1.1 Motivation for Feature Extraction

In many machine learning tasks, the performance of models depends heavily on the quality of the features used. Feature extraction refers to the process of transforming raw data into a set of useful features that can improve the efficiency and accuracy of machine learning models.

SVD plays a crucial role in feature extraction by reducing high-dimensional data into lower-dimensional subspaces, where only the most informative features are retained. This is particularly helpful in domains like image processing, text analysis, and bioinformatics, where datasets often contain many features that are noisy or redundant.

4.1.2 SVD for Feature Extraction

Given a data matrix \(A \in \mathbb{R}^{n \times p}\), where \(n\) is the number of samples and \(p\) is the number of features, we can use SVD to extract the most important features by keeping only the top \(k\) singular values and their corresponding singular vectors.

The right singular vectors \(V_k \in \mathbb{R}^{p \times k}\) (the first \(k\) rows of \(V^T\)) contain the directions of the most important features in the data. These directions correspond to the principal components when SVD is applied to the centered data matrix, similar to Principal Component Analysis (PCA).

By projecting the data onto these singular vectors, we obtain a new set of features that capture most of the variability in the data while reducing its dimensionality.

4.1.2.1 Example: Feature Extraction in Text Classification

In text classification tasks, each document can be represented as a high-dimensional vector (e.g., a TF-IDF vector), where each dimension corresponds to a word in the vocabulary. However, the number of dimensions (words) can be very large, and many of them may be irrelevant to the classification task.

SVD can be used to extract a smaller set of features that capture the most important semantic information in the documents. These features can then be used as inputs to machine learning algorithms like logistic regression or support vector machines (SVM).

import numpy as np
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer

# Example: Text classification feature extraction using SVD
corpus = [
    'This is the first document.',
    'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?'
]

# Convert text to TF-IDF features
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)

# Apply SVD for feature extraction
svd = TruncatedSVD(n_components=2)  # Reduce to 2 features
X_reduced = svd.fit_transform(X)

print("Reduced Features:\n", X_reduced)
Reduced Features:
 [[ 0.95905678 -0.13453834]
 [ 0.79765181 -0.18548718]
 [ 0.45705072  0.88833467]
 [ 0.95905678 -0.13453834]]

In this example, we use TruncatedSVD to reduce the number of features from the original high-dimensional TF-IDF space to a lower-dimensional space with 2 features, which can be used in text classification tasks.


4.2 Latent Semantic Analysis (LSA)

4.2.1 Overview of Latent Semantic Analysis (LSA)

Latent Semantic Analysis (LSA) is a natural language processing (NLP) technique used to extract hidden, or latent, semantic relationships between terms in a document-term matrix. It is widely used in tasks like document retrieval, topic modeling, and text classification.

LSA is based on the idea that there are latent concepts in a corpus of documents that determine the patterns of word usage. For example, in a collection of documents about technology, terms like “computer,” “software,” and “internet” may frequently co-occur and can be grouped into a latent concept representing “technology.”

4.2.2 Applying SVD in LSA

LSA relies on SVD to decompose the document-term matrix into three components:

\[ A = U \Sigma V^T \]

Where:

  • \(A \in \mathbb{R}^{n \times p}\) is the document-term matrix, where \(n\) is the number of documents and \(p\) is the number of terms (words).
  • \(U \in \mathbb{R}^{n \times k}\) contains the latent concept space for documents.
  • \(\Sigma \in \mathbb{R}^{k \times k}\) contains the singular values that represent the strength of each latent concept.
  • \(V^T \in \mathbb{R}^{k \times p}\) contains the latent concept space for terms.

The matrix \(A_k = U_k \Sigma_k V_k^T\) provides a low-rank approximation of the original document-term matrix, where \(k\) latent concepts are retained.

By reducing the dimensionality of the document-term matrix, LSA captures the latent semantic relationships between terms and documents, which can improve tasks like document retrieval and information retrieval.

4.2.2.1 Example: Applying LSA to a Corpus

from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer

# Example corpus
corpus = [
    'Human computer interaction',
    'Computer science',
    'Computers are becoming smarter',
    'Human machine interface',
    'Artificial intelligence and computer science'
]

# Convert the text to a document-term matrix (TF-IDF)
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)

# Apply SVD for LSA (reduce to 2 latent concepts)
svd = TruncatedSVD(n_components=2)
X_lsa = svd.fit_transform(X)

print("Latent Concepts (LSA):\n", X_lsa)
Latent Concepts (LSA):
 [[ 6.16624939e-01  5.12241288e-01]
 [ 8.26654293e-01 -2.77824493e-01]
 [-3.70178445e-16  8.35562891e-16]
 [ 2.45626863e-01  8.27098884e-01]
 [ 7.54152585e-01 -3.83680508e-01]]

In this example, we use TruncatedSVD to reduce the dimensionality of the document-term matrix to 2 latent concepts, which represent underlying topics in the corpus. The transformed matrix \(X_{\text{lsa}}\) captures the most important semantic relationships between documents and terms.


4.3 Recommender Systems

4.3.1 Overview of Recommender Systems

Recommender systems are a class of machine learning algorithms designed to suggest items to users based on their preferences. These systems are widely used in platforms like Netflix (movie recommendations), Amazon (product recommendations), and Spotify (music recommendations).

There are two primary types of recommender systems:

  1. Content-based filtering: Recommends items based on the features of the items and user preferences.
  2. Collaborative filtering: Recommends items by finding patterns in user-item interactions (e.g., ratings, clicks).

Matrix factorization techniques, such as SVD, are commonly used in collaborative filtering to uncover latent factors that explain user preferences and item characteristics.

4.3.2 Applying SVD in Recommender Systems

In collaborative filtering, the user-item interaction matrix \(A \in \mathbb{R}^{m \times n}\), where \(m\) is the number of users and \(n\) is the number of items, is often sparse, meaning that many users have not rated many items.

SVD can be applied to decompose the user-item matrix:

\[ A = U \Sigma V^T \]

Where:

  • \(U \in \mathbb{R}^{m \times k}\) represents the latent preferences of users.
  • \(\Sigma \in \mathbb{R}^{k \times k}\) contains the singular values, which capture the importance of each latent factor.
  • \(V^T \in \mathbb{R}^{k \times n}\) represents the latent characteristics of items.

By using SVD, we can predict missing values in the user-item matrix (i.e., the ratings that users have not provided) by reconstructing the matrix from the top \(k\) singular values.

4.3.3 Example: Building a Movie Recommender System with SVD

import numpy as np
from scipy.sparse.linalg import svds

# Example user-item rating matrix (rows: users, columns: items)
R = np.array([
    [5, 3, 0, 1],  # User 0
    [4, 0, 0, 1],  # User 1
    [1, 1, 0, 5],  # User 2
    [1, 0, 0, 4],  # User 3
    [0, 1, 5, 4],  # User 4
], dtype=float)

# Number of users and items
n_users, n_items = R.shape

# Apply SVD
U, Sigma, Vt = svds(R, k=2)

# Reconstruct the approximate matrix (predicted ratings)
Sigma = np.diag(Sigma)
R_pred = np.dot(np.dot(U, Sigma), Vt)

# Function to recommend top N items for a specific user
def recommend_items(user_id, R, R_pred, n_recommendations=2):
    # Get the user's ratings from the original matrix
    user_ratings = R[user_id, :]
    
    # Get the predicted ratings for the user
    user_predicted_ratings = R_pred[user_id, :]
    
    # Find items that the user hasn't rated yet (i.e., missing ratings)
    unrated_items = np.where(user_ratings == 0)[0]
    
    # Get the predicted ratings for the unrated items
    predicted_unrated_items = [(item_id, user_predicted_ratings[item_id]) for item_id in unrated_items]
    
    # Sort the unrated items by the predicted rating, in descending order
    recommended_items = sorted(predicted_unrated_items, key=lambda x: x[1], reverse=True)
    
    # Return the top N recommendations
    top_recommendations = recommended_items[:n_recommendations]
    
    return top_recommendations

# Example: Recommend top 2 items for User 1
user_id = 1
recommendations = recommend_items(user_id, R, R_pred, n_recommendations=2)

print(f"Top recommendations for User {user_id}: {recommendations}")
Top recommendations for User 1: [(1, 1.2807533124116666), (2, -0.4562968937599765)]

In this example, we use SVD to decompose the user-item matrix and predict missing ratings. The predicted matrix \(R_{\text{pred}}\) can be used to recommend items to users based on their latent preferences.


4.4 Summary of SVD Applications in Machine Learning

  • Feature Extraction: SVD can be used to reduce the number of features while retaining the most important information, improving the performance of machine learning models.

  • Latent Semantic Analysis (LSA): SVD is used in LSA to uncover hidden semantic structures in text data, making it a valuable tool for tasks like document retrieval and topic modeling.

  • Recommender Systems: SVD plays a key role in collaborative filtering by decomposing the user-item matrix and predicting missing values (ratings) for users, which allows for personalized recommendations.

In the next chapter, we’ll explore how SVD can be applied for data preprocessing tasks like handling missing data, noise reduction, and data compression.