Singular Value Decomposition

←Back to Tech Tree

inventorycoverage

Singular Value Decomposition #

Linear AlgebraDifficulty: ★★★★☆Depth: 6Unlocks: 2

A = UΣV^T. Fundamental matrix factorization.

Interactive Visualization #

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

Core Concepts #

Key Symbols & Notation #

U and V (orthonormal matrices whose columns are left and right singular vectors)Sigma (rectangular diagonal matrix of nonnegative entries called singular values)

Essential Relationships #

Prerequisites (2) #

Eigenvalues and Eigenvectors6 atomsOrthogonality5 atoms

Unlocks (1) #

Principal Component Analysislvl 4

Advanced Learning Details

Graph Position #

84

Depth Cost

2

Fan-Out (ROI)

1

Bottleneck Score

6

Chain Length

Cognitive Load #

6

Atomic Elements

46

Total Elements

L3

Percentile Level

L4

Atomic Level

All Concepts (16) #

Teaching Strategy #

Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.

Most linear algebra tools tell you what a matrix does to special vectors. Eigenvectors work beautifully—when the matrix is square and behaves nicely. The Singular Value Decomposition (SVD) is the “works anyway” factorization: it describes how any real m×n matrix transforms space by rotating (or reflecting), scaling along orthogonal directions, then rotating again.

TL;DR:

For any real matrix A ∈ ℝ^{m×n}, there exist orthonormal matrices U ∈ ℝ^{m×m}, V ∈ ℝ^{n×n}, and a diagonal (rectangular) matrix Σ ∈ ℝ^{m×n} with nonnegative diagonal entries σ₁ ≥ σ₂ ≥ … ≥ 0 such that A = UΣVᵀ. Geometrically: Vᵀ rotates the input, Σ scales along orthogonal axes, U rotates the output. The σᵢ are singular values; columns of V are right singular vectors; columns of U are left singular vectors. SVD underlies best low-rank approximation and PCA.

What Is Singular Value Decomposition? #

Why we need something beyond eigenvalues #

Eigenvalues and eigenvectors answer: “Are there directions v that A maps onto themselves up to scaling?”

SVD fixes these issues by giving an orthonormal coordinate system in the input and output spaces.

Definition #

For any real matrix A ∈ ℝ^{m×n}, the Singular Value Decomposition is

A = UΣVᵀ

where:

Concretely, Σ looks like

Σ = [ diag(σ₁,…,σ_r) 0 ]

[ 0 0 ]

with the diagonal sitting in the top-left.

The “three-step” story #

SVD says A can be viewed as three simple transformations composed:

  1. Vᵀ: rotate/reflect input space (ℝⁿ) into a special orthonormal basis (the right singular vectors)

  2. Σ: scale along coordinate axes by nonnegative factors σᵢ

  3. U: rotate/reflect output space (ℝᵐ) into the standard basis

Because U and V are orthonormal, they preserve lengths and angles:

‖Vᵀx‖ = ‖x‖, ‖Uy‖ = ‖y‖.

So all stretching/squashing is completely captured by Σ.

Geometric intuition: sphere → ellipsoid #

Consider the unit sphere in ℝⁿ:

S = { x ∈ ℝⁿ : ‖x‖ = 1 }.

Under A, the set A(S) is generally an ellipsoid (possibly flattened into lower dimensions). SVD explains why:

The singular values σᵢ are the lengths of the ellipsoid’s semi-axes.

The cast of characters #

Let V = [v₁ … vₙ] and U = [u₁ … uₘ]. Then:

And the defining relationships are:

Avᵢ = σᵢ u

Aᵀuᵢ = σᵢ v

These say: A maps orthonormal directions vᵢ to orthonormal directions uᵢ, scaled by σᵢ.

Existence (why you can trust it) #

A key promise of SVD: every real matrix has an SVD.

This is not just a convenience—it is the reason SVD becomes a universal tool in numerical linear algebra, data analysis, and machine learning.

A common route to existence uses the fact that AᵀA is symmetric positive semidefinite, hence has an orthonormal eigenbasis. From that eigenbasis we build V and Σ, then define U accordingly. We’ll do this carefully in the next section.

Core Mechanic 1: Constructing SVD from AᵀA (and What Singular Values Really Are) #

Why start with AᵀA? #

We want orthonormal directions in the input space that A treats “nicely.” AᵀA is the right object because:

xᵀ(AᵀA)x = (Ax)ᵀ(Ax) = ‖Ax‖² ≥ 0.

So AᵀA has real eigenvalues and orthonormal eigenvectors.

Step 1: Eigen-decompose AᵀA #

Because AᵀA is symmetric, there exists an orthonormal matrix V and diagonal Λ such that:

AᵀA = VΛVᵀ

where Λ = diag(λ₁,…,λₙ) with λᵢ ≥ 0.

Let the eigenpairs be:

(AᵀA)vᵢ = λᵢ v

with ‖vᵢ‖ = 1 and vᵢ ⟂ vⱼ for i ≠ j.

Step 2: Define singular values #

Singular values are defined as

σᵢ = √λᵢ (nonnegative)

and arranged in decreasing order σ₁ ≥ σ₂ ≥ … ≥ 0.

So singular values are literally the square-roots of eigenvalues of AᵀA.

Step 3: Define left singular vectors #

For any λᵢ > 0 (equivalently σᵢ > 0), define

uᵢ = (1/σᵢ) Avᵢ.

Then uᵢ is a unit vector, and distinct uᵢ are orthogonal. Let’s show the unit-length claim carefully.

Compute ‖uᵢ‖²:

uᵢ‖² = uᵢᵀu

= ((1/σᵢ)Avᵢ)ᵀ ((1/σᵢ)Avᵢ)

= (1/σᵢ²) vᵢᵀ AᵀA v

= (1/σᵢ²) vᵢᵀ (λᵢ vᵢ)

= (1/σᵢ²) λᵢ (vᵢᵀvᵢ)

= (1/σᵢ²) λᵢ

= (1/(√λᵢ)²) λᵢ

= 1.

So ‖uᵢ‖ = 1.

Now check orthogonality for i ≠ j (assuming σᵢ,σⱼ > 0):

uᵢᵀu

= ((1/σᵢ)Avᵢ)ᵀ ((1/σⱼ)Avⱼ)

= (1/(σᵢσⱼ)) vᵢᵀ AᵀA v

= (1/(σᵢσⱼ)) vᵢᵀ (λⱼ vⱼ)

= (λⱼ/(σᵢσⱼ)) vᵢᵀv

= 0.

Thus {uᵢ} are orthonormal over the nonzero singular values.

For the remaining columns of U (if m > r), we choose any orthonormal vectors that complete a basis of ℝᵐ. This completion is not unique.

Step 4: Assemble Σ, U, V #

Let r = rank(A) = number of positive singular values.

Then we can show A = UΣVᵀ.

One clean way is to check what A does to each vᵢ:

Avᵢ = σᵢ uᵢ.

Now consider the matrix UΣVᵀ applied to vᵢ:

(UΣVᵀ)v

= UΣ (Vᵀvᵢ)

= UΣ e

= U (σᵢ eᵢ)

= σᵢ uᵢ.

So A and UΣVᵀ agree on the basis vectors vᵢ, hence they are the same linear map.

What singular values mean: operator norm and energy #

The largest singular value σ₁ tells you the maximum stretching factor of A:

σ₁ = max_{‖x‖=1} ‖Ax‖.

You can see this from

‖Ax‖² = xᵀAᵀAx.

For symmetric matrices, the maximum of xᵀMx over ‖x‖=1 is the largest eigenvalue. Here M = AᵀA, so the maximum is λ₁, and thus max ‖Ax‖ = √λ₁ = σ₁.

More generally, the sum of squared singular values equals the squared Frobenius norm:

‖A‖_F² = ∑_{i=1}^{r} σᵢ².

This becomes useful for measuring how much “energy” is captured by keeping only the top k singular values.

Relationship to eigenvalues (what carries over, what doesn’t) #

If A is symmetric positive semidefinite, then its eigen-decomposition A = QΛQᵀ looks SVD-like.

In that case:

But in general:

This is why SVD is the robust “geometric backbone” behind many algorithms.

Core Mechanic 2: Geometry, Rank, and the Outer-Product (Sum of Rank-1) View #

Why a rank-1 sum matters #

Matrices can be complicated globally, but rank-1 matrices are simple:

So if we can write A as a sum of rank-1 pieces, we get an interpretable and computable structure.

SVD gives exactly that.

The outer-product expansion of SVD #

Write the (thin) SVD using r = rank(A):

A = U_r Σ_r V_rᵀ

where:

Then:

A = ∑_{i=1}^{r} σᵢ uvᵢᵀ.

This is worth pausing on:

Derivation (showing the work) #

Start from thin SVD:

A = U_r Σ_r V_rᵀ.

Let Σ_r have diagonal entries σᵢ. We can write Σ_r as a sum of diagonal basis matrices:

Σ_r = ∑_{i=1}^{r} σᵢ (eeᵢᵀ).

Then

A = U_r (∑_{i=1}^{r} σᵢ eeᵢᵀ) V_rᵀ

= ∑_{i=1}^{r} σᵢ (U_r eᵢ) (V_r eᵢ)ᵀ

= ∑_{i=1}^{r} σᵢ uvᵢᵀ.

That’s the outer-product form.

Geometry: orthogonal directions map to orthogonal directions #

The defining relation Avᵢ = σᵢ uᵢ means:

If you take an arbitrary input vector x and express it in the V-basis:

x = ∑_{i=1}^{n} αᵢ v

where αᵢ = vᵢᵀx.

Then

Ax

= A(∑_{i} αᵢ vᵢ)

= ∑_{i} αᵢ Av

= ∑_{i} αᵢ σᵢ uᵢ.

So the components along vᵢ get scaled by σᵢ and re-expressed along uᵢ.

This is the ellipsoid story in coordinates.

Rank and null spaces through Σ #

Because Σ has σ₁,…,σ_r > 0 and the rest 0:

The right singular vectors associated with σᵢ = 0 span null(A):

If σᵢ = 0, then Avᵢ = 0.

Similarly, left singular vectors with σᵢ = 0 span null(Aᵀ).

This makes SVD a “coordinate system” where the fundamental subspaces (row space, column space, null space) become clean blocks.

Full SVD vs thin SVD (practical distinction) #

You will see two common SVD shapes:

NameFactorizationShapesWhen used
Full SVDA = UΣVᵀU: m×m, Σ: m×n, V: n×nTheory, explicit bases for all spaces
Thin/compact SVDA = U_r Σ_r V_rᵀU_r: m×r, Σ_r: r×r, V_r: n×rComputation, low-rank approximation

The thin SVD discards the “zero singular value” directions and keeps only what A actually uses.

Best rank-k approximation (why SVD dominates compression) #

A remarkable theorem (Eckart–Young–Mirsky) says:

If you want a rank-k matrix B that best approximates A, then truncating the SVD is optimal.

Define

A_k = ∑_{i=1}^{k} σᵢ uvᵢᵀ.

Then A_k solves:

min_{rank(B)=k} ‖A − B‖_F

and also

min_{rank(B)=k} ‖A − B‖₂.

Intuition:

Quantitatively, the truncation error is clean:

‖A − A_k‖_F² = ∑_{i=k+1}^{r} σᵢ²

‖A − A_k‖₂ = σ_{k+1}.

These formulas are incredibly useful in practice: they tell you exactly what you lose by compressing.

Application/Connection: PCA, Least Squares, and Why SVD Is a Workhorse #

PCA connection (what SVD unlocks) #

Principal Component Analysis (PCA) finds directions of maximal variance in data.

Suppose you have a centered data matrix X ∈ ℝ^{N×d}:

The sample covariance is

C = (1/(N−1)) XᵀX.

PCA traditionally says: compute eigenvectors of C.

Now notice:

XᵀX = VΣᵀUᵀUΣVᵀ = V(ΣᵀΣ)Vᵀ.

Since UᵀU = I, we get an eigen-decomposition:

C = (1/(N−1)) V(ΣᵀΣ)Vᵀ.

So:

This is why numerical PCA is often implemented via SVD of X directly.

Least squares and the pseudoinverse #

Given an overdetermined system Axb (m > n), the least squares solution minimizes

min_{x} ‖Axb‖².

SVD provides a stable way to solve it using the Moore–Penrose pseudoinverse A⁺.

If A = UΣVᵀ, then

A⁺ = V Σ⁺ Uᵀ

where Σ⁺ is formed by inverting nonzero singular values:

Σ⁺_{ii} = 1/σᵢ for i ≤ r, and 0 otherwise.

Then one least-squares solution is

= A⁺ b.

Why this is good:

Conditioning and numerical stability #

The condition number (in 2-norm) of a full-rank matrix is

κ₂(A) = σ₁ / σ_n

(or σ₁/σ_r if rank-deficient).

Interpretation:

SVD makes this sensitivity visible and measurable.

A practical mental model #

When you see A = UΣVᵀ, think:

And when you truncate at k:

A ≈ A_k = U_k Σ_k V_kᵀ

you are choosing to keep only the k most influential input-output modes.

Where SVD appears (a few common places) #

SVD is not just a theorem—it is a lens for understanding what a matrix does, and a toolkit for acting on that understanding.

Worked Examples (3) #

Compute an SVD of a simple diagonal matrix (and see the geometry) #

Let A = [[3, 0], [0, 1]]. Find an SVD and interpret σ₁, σ₂ geometrically.

  1. Recognize A is already diagonal with nonnegative diagonal entries.

    We suspect V = I and U = I might work.

  2. Compute AᵀA:

    AᵀA = A² = [[9, 0], [0, 1]].

  3. Find eigenvalues/eigenvectors of AᵀA.

    Eigenvalues: λ₁ = 9, λ₂ = 1.

    Eigenvectors are the standard basis:

    v₁ = [1,0]ᵀ, v₂ = [0,1]ᵀ.

    Thus V = I.

  4. Convert eigenvalues to singular values:

    σ₁ = √9 = 3

    σ₂ = √1 = 1

    So Σ = [[3,0],[0,1]].

  5. Compute left singular vectors:

    uᵢ = (1/σᵢ)Avᵢ.

    For i=1:

    Av₁ = [3,0]ᵀ ⇒ u₁ = (1/3)[3,0]ᵀ = [1,0]ᵀ.

    For i=2:

    Av₂ = [0,1]ᵀ ⇒ u₂ = (1/1)[0,1]ᵀ = [0,1]ᵀ.

    Thus U = I.

  6. Verify:

    UΣVᵀ = I · Σ · I = A.

Insight: This matrix scales the x-axis by 3 and the y-axis by 1. The unit circle maps to an axis-aligned ellipse with semi-axes 3 and 1. SVD reduces to “rotate (none), scale, rotate (none)”—a clean case that matches the general geometric story.

SVD of a rank-1 matrix via AᵀA #

Let A = [[1, 1], [1, 1]]. Compute its singular values and one valid SVD.

  1. Compute AᵀA:

    AᵀA = [[1,1],[1,1]]ᵀ[[1,1],[1,1]] = [[2,2],[2,2]].

  2. Find eigenvalues of AᵀA.

    Matrix [[2,2],[2,2]] has:

    • •eigenvector v₁ ∥ [1,1]ᵀ with eigenvalue λ₁ = 4
    • •eigenvector v₂ ∥ [1,−1]ᵀ with eigenvalue λ₂ = 0

    Normalize:

    v₁ = (1/√2)[1,1]ᵀ

    v₂ = (1/√2)[1,−1]ᵀ

    Set V = [vv₂].

  3. Singular values:

    σ₁ = √λ₁ = 2

    σ₂ = √λ₂ = 0

    So rank r = 1.

  4. Compute u₁ = (1/σ₁)Av₁:

    Av₁ = [[1,1],[1,1]] (1/√2)[1,1]ᵀ

    = (1/√2)[2,2]ᵀ

    = √2 [1,1]ᵀ

    Thus u₁ = (1/2) √2 [1,1]ᵀ = (1/√2)[1,1]ᵀ.

  5. Complete U to an orthonormal basis.

    Choose u₂ = (1/√2)[1,−1]ᵀ.

    Then U = [uu₂] is orthonormal.

  6. Assemble Σ = [[2,0],[0,0]].

    Then A = UΣVᵀ holds (one valid SVD).

Insight: A has rank 1: it only acts nontrivially along the direction v₁ = (1/√2)[1,1]. Everything orthogonal to that direction is sent to 0 (σ₂ = 0). The unit circle is squashed into a line segment (a degenerate ellipsoid).

Best rank-1 approximation error from singular values #

Suppose A has singular values σ₁ = 10, σ₂ = 3, σ₃ = 1 (and no others). Compute ‖A − A₁‖_F and ‖A − A₁‖₂.

  1. Recall truncation errors:

    ‖A − A_k‖_F² = ∑_{i=k+1}^{r} σᵢ²

    ‖A − A_k‖₂ = σ_{k+1}.

  2. Here r = 3 and k = 1.

    So:

    ‖A − A₁‖_F² = σ₂² + σ₃² = 3² + 1² = 9 + 1 = 10.

  3. Therefore:

    ‖A − A₁‖_F = √10.

  4. Spectral norm error:

    ‖A − A₁‖₂ = σ₂ = 3.

Insight: SVD doesn’t just give a method—it gives exact guarantees. The leftover singular values quantify precisely what information you discard when compressing.

Key Takeaways #

Common Mistakes #

Practice #

medium

Let A ∈ ℝ^{m×n} have SVD A = UΣVᵀ. Show that ‖Ax‖ ≤ σ₁‖x‖ for all x ∈ ℝⁿ.

Hint: Write x = Vz with z = Vᵀx. Use orthonormality to relate ‖z‖ and ‖x‖, then bound ‖Σz‖ by σ₁‖z‖.

Show solution

Let z = Vᵀx, so x = Vz and ‖z‖ = ‖x‖ since V is orthonormal.

Then:

Ax = UΣVᵀx = UΣz.

Taking norms and using orthonormality of U:

‖Ax‖ = ‖UΣz‖ = ‖Σz‖.

Since Σ scales coordinates by at most σ₁,

‖Σz‖² = ∑ σᵢ² zᵢ² ≤ σ₁² ∑ zᵢ² = σ₁²‖z‖².

Thus ‖Ax‖ ≤ σ₁‖z‖ = σ₁‖x‖.

easy

Compute the singular values of A = [[2, 0], [0, −1]] and give one valid SVD.

Hint: Compute AᵀA and take square-roots of its eigenvalues. Remember singular values are nonnegative; signs can be absorbed into U or V.

Show solution

AᵀA = [[4,0],[0,1]]. Its eigenvalues are 4 and 1, so singular values are σ₁=2, σ₂=1.

One SVD: take V = I, Σ = [[2,0],[0,1]].

We need U such that UΣ = A. Since A differs by a sign on the second axis, choose U = diag(1, −1).

Then UΣVᵀ = diag(1,−1) diag(2,1) I = diag(2, −1) = A.

medium

Suppose A has singular values 5, 2, 2, 0. What are rank(A), dim(null(A)), and the best rank-2 Frobenius error ‖A − A₂‖_F?

Hint: Rank is the number of positive singular values. Use ‖A − A_k‖_F² = ∑_{i>k} σᵢ².

Show solution

Positive singular values: 5,2,2 → rank(A)=3.

If A is m×n, then dim(null(A)) = n − rank(A) = n − 3.

For k=2:

‖A − A₂‖_F² = σ₃² + σ₄² = 2² + 0² = 4.

So ‖A − A₂‖_F = 2.

Connections #

Eigenvalues and Eigenvectors

Orthogonality

Principal Component Analysis

Quality: A (4.3/5)

← back to treebrowse all →