1 Eigenvalues, Eigenvectors, and SVD

Powerful mathematical tools for understanding linear transformations and data analysis.

1.1 Eigenvalues and Eigenvectors

1.1.1 The Eigenvalue Problem

For a square matrix A, find scalars λ and non-zero vectors v such that: A·v = λ·v

This equation represents the eigenvalue equation.

1.1.2 Geometric Interpretation

  • Eigenvectors: Directions that remain unchanged (up to scaling) under linear transformation
  • Eigenvalues: Scaling factors along those directions

1.1.3 Characteristic Equation

The eigenvalues are roots of the characteristic polynomial: det(A - λI) = 0

1.2 Applications in Machine Learning

1.2.1 Principal Component Analysis (PCA)

PCA finds the principal directions of variance in data.

import numpy as np
from sklearn.decomposition import PCA
 
# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 3)
 
# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
 
print("Explained variance ratio:", pca.explained_variance_ratio_)
print("Principal components shape:", pca.components_.shape)

1.2.2 Power Iteration Method

Algorithm to find the dominant eigenvalue and eigenvector:

def power_iteration(A, num_iterations=100):
    """Find dominant eigenvalue and eigenvector using power iteration"""
    n = A.shape[0]
    v = np.random.rand(n)
 
    for _ in range(num_iterations):
        # Multiply by matrix
        v_new = np.dot(A, v)
        # Normalize
        v_new = v_new / np.linalg.norm(v_new)
        v = v_new
 
    # Compute eigenvalue
    eigenvalue = np.dot(v, np.dot(A, v)) / np.dot(v, v)
 
    return eigenvalue, v

1.3 Singular Value Decomposition (SVD)

1.3.1 Matrix Decomposition

Every m×n matrix A can be decomposed as: A = U × Σ × Vᵀ

Where:

  • U: m×m orthogonal matrix (left singular vectors)
  • Σ: m×n diagonal matrix (singular values)
  • V: n×n orthogonal matrix (right singular vectors)

1.3.2 Low-Rank Approximation

SVD enables optimal low-rank approximations:

def low_rank_approximation(A, k):
    """Compute rank-k approximation of matrix A using SVD"""
    U, s, Vt = np.linalg.svd(A)
 
    # Keep only top k singular values
    S_k = np.diag(s[:k])
 
    # Reconstruct approximation
    A_k = U[:, :k] @ S_k @ Vt[:k, :]
 
    return A_k

1.4 Spectral Graph Theory

1.4.1 Graph Laplacian

For a graph with adjacency matrix A: L = D - A

Where D is the degree matrix.

1.4.2 Spectral Clustering

def spectral_clustering(adjacency_matrix, k):
    """Perform spectral clustering"""
    # Compute Laplacian
    degree_matrix = np.diag(np.sum(adjacency_matrix, axis=1))
    laplacian = degree_matrix - adjacency_matrix
 
    # Compute eigenvectors
    eigenvalues, eigenvectors = np.linalg.eigh(laplacian)
 
    # Use k smallest eigenvectors (excluding first)
    X = eigenvectors[:, 1:k+1]
 
    # Normalize rows
    X = X / np.linalg.norm(X, axis=1, keepdims=True)
 
    # Apply k-means clustering
    from sklearn.cluster import KMeans
    kmeans = KMeans(n_clusters=k, random_state=42)
    labels = kmeans.fit_predict(X)
 
    return labels

1.5 Numerical Considerations

1.5.1 Condition Number

The condition number κ(A) = σ₁/σₙ measures matrix sensitivity:

  • Well-conditioned: κ(A) ≈ 1
  • Ill-conditioned: κ(A) >> 1

1.5.2 Eigenvalue Algorithms

  • QR Algorithm: For finding all eigenvalues
  • Jacobi Method: For symmetric matrices
  • Lanczos Method: For large sparse matrices

1.6 Applications in Deep Learning

1.6.1 Neural Network Analysis

def analyze_neural_network_weights(weights):
    """Analyze neural network weights using eigenvalues"""
    # Compute weight matrix eigenvalues
    eigenvalues, eigenvectors = np.linalg.eigh(weights @ weights.T)
 
    # Compute condition number
    condition_number = eigenvalues.max() / eigenvalues.min()
 
    # Check for vanishing/exploding gradients
    if condition_number > 100:
        print("Warning: Potentially unstable gradients")
    elif condition_number < 0.01:
        print("Warning: Potentially vanishing gradients")
 
    return condition_number, eigenvalues

1.6.2 Convolutional Neural Networks

SVD can be used for:

  • Network compression: Reducing parameter count
  • Knowledge distillation: Transferring knowledge between models
  • Pruning: Removing redundant filters

1.7 Key Takeaways

  1. Eigenvalues reveal the scaling behavior of linear transformations
  2. SVD provides the optimal low-rank approximation of any matrix
  3. Spectral methods are powerful tools for data analysis and clustering
  4. Numerical stability is crucial when working with eigenvalues

These concepts form the mathematical foundation for many modern machine learning algorithms and data analysis techniques.

0 items under this folder.