1 Eigenvalues, Eigenvectors, and SVD
Powerful mathematical tools for understanding linear transformations and data analysis.
1.1 Eigenvalues and Eigenvectors
1.1.1 The Eigenvalue Problem
For a square matrix A, find scalars λ and non-zero vectors v such that:
A·v = λ·v
This equation represents the eigenvalue equation.
1.1.2 Geometric Interpretation
- Eigenvectors: Directions that remain unchanged (up to scaling) under linear transformation
- Eigenvalues: Scaling factors along those directions
1.1.3 Characteristic Equation
The eigenvalues are roots of the characteristic polynomial:
det(A - λI) = 0
1.2 Applications in Machine Learning
1.2.1 Principal Component Analysis (PCA)
PCA finds the principal directions of variance in data.
import numpy as np
from sklearn.decomposition import PCA
# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 3)
# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
print("Explained variance ratio:", pca.explained_variance_ratio_)
print("Principal components shape:", pca.components_.shape)1.2.2 Power Iteration Method
Algorithm to find the dominant eigenvalue and eigenvector:
def power_iteration(A, num_iterations=100):
"""Find dominant eigenvalue and eigenvector using power iteration"""
n = A.shape[0]
v = np.random.rand(n)
for _ in range(num_iterations):
# Multiply by matrix
v_new = np.dot(A, v)
# Normalize
v_new = v_new / np.linalg.norm(v_new)
v = v_new
# Compute eigenvalue
eigenvalue = np.dot(v, np.dot(A, v)) / np.dot(v, v)
return eigenvalue, v1.3 Singular Value Decomposition (SVD)
1.3.1 Matrix Decomposition
Every m×n matrix A can be decomposed as:
A = U × Σ × Vᵀ
Where:
- U: m×m orthogonal matrix (left singular vectors)
- Σ: m×n diagonal matrix (singular values)
- V: n×n orthogonal matrix (right singular vectors)
1.3.2 Low-Rank Approximation
SVD enables optimal low-rank approximations:
def low_rank_approximation(A, k):
"""Compute rank-k approximation of matrix A using SVD"""
U, s, Vt = np.linalg.svd(A)
# Keep only top k singular values
S_k = np.diag(s[:k])
# Reconstruct approximation
A_k = U[:, :k] @ S_k @ Vt[:k, :]
return A_k1.4 Spectral Graph Theory
1.4.1 Graph Laplacian
For a graph with adjacency matrix A:
L = D - A
Where D is the degree matrix.
1.4.2 Spectral Clustering
def spectral_clustering(adjacency_matrix, k):
"""Perform spectral clustering"""
# Compute Laplacian
degree_matrix = np.diag(np.sum(adjacency_matrix, axis=1))
laplacian = degree_matrix - adjacency_matrix
# Compute eigenvectors
eigenvalues, eigenvectors = np.linalg.eigh(laplacian)
# Use k smallest eigenvectors (excluding first)
X = eigenvectors[:, 1:k+1]
# Normalize rows
X = X / np.linalg.norm(X, axis=1, keepdims=True)
# Apply k-means clustering
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=k, random_state=42)
labels = kmeans.fit_predict(X)
return labels1.5 Numerical Considerations
1.5.1 Condition Number
The condition number κ(A) = σ₁/σₙ measures matrix sensitivity:
- Well-conditioned: κ(A) ≈ 1
- Ill-conditioned: κ(A) >> 1
1.5.2 Eigenvalue Algorithms
- QR Algorithm: For finding all eigenvalues
- Jacobi Method: For symmetric matrices
- Lanczos Method: For large sparse matrices
1.6 Applications in Deep Learning
1.6.1 Neural Network Analysis
def analyze_neural_network_weights(weights):
"""Analyze neural network weights using eigenvalues"""
# Compute weight matrix eigenvalues
eigenvalues, eigenvectors = np.linalg.eigh(weights @ weights.T)
# Compute condition number
condition_number = eigenvalues.max() / eigenvalues.min()
# Check for vanishing/exploding gradients
if condition_number > 100:
print("Warning: Potentially unstable gradients")
elif condition_number < 0.01:
print("Warning: Potentially vanishing gradients")
return condition_number, eigenvalues1.6.2 Convolutional Neural Networks
SVD can be used for:
- Network compression: Reducing parameter count
- Knowledge distillation: Transferring knowledge between models
- Pruning: Removing redundant filters
1.7 Key Takeaways
- Eigenvalues reveal the scaling behavior of linear transformations
- SVD provides the optimal low-rank approximation of any matrix
- Spectral methods are powerful tools for data analysis and clustering
- Numerical stability is crucial when working with eigenvalues
These concepts form the mathematical foundation for many modern machine learning algorithms and data analysis techniques.