Matrix Decomposition

←Back to Tech Tree

inventorycoverage

Matrix Decomposition #

Linear AlgebraDifficulty: ★★★☆☆Depth: 4Unlocks: 0

LU, QR decomposition. Breaking matrices into factors.

Interactive Visualization #

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

Core Concepts #

Key Symbols & Notation #

L, U, Q, R, P - factor labels: L (lower-triangular), U (upper-triangular), Q (orthogonal/unitary), R (upper-triangular), P (permutation for pivoting)

Essential Relationships #

Prerequisites (2) #

Systems of Linear Equations6 atomsDeterminants6 atoms

Advanced Learning Details

Graph Position #

46

Depth Cost

0

Fan-Out (ROI)

0

Bottleneck Score

4

Chain Length

Cognitive Load #

6

Atomic Elements

68

Total Elements

L4

Percentile Level

L4

Atomic Level

All Concepts (29) #

Teaching Strategy #

Deep-dive lesson - accessible entry point but dense material. Use worked examples and spaced repetition.

Matrix decomposition is the idea that a “complicated” matrix can often be rewritten as a product of “structured” matrices (triangular, orthogonal, permutation). That one rewrite turns many hard tasks—solving systems, computing determinants, least squares, and inverses—into sequences of easy steps.

TL;DR:

Matrix decomposition (factorization) rewrites A as a product like P A = L U or A = Q R. LU connects directly to Gaussian elimination for fast solves; QR uses orthogonality for numerical stability and least squares.

What Is Matrix Decomposition? #

Why this concept exists (motivation) #

In linear algebra, many problems repeatedly reduce to operations with the same matrix A:

If you try to do each task “from scratch,” you typically re-run elimination steps or do expensive computations repeatedly. Matrix decomposition avoids that repetition by separating A into factors that are easier to work with.

Definition (factorization) #

A matrix decomposition (or factorization) is a representation

A = A₁ A₂ … Aₖ

where the factors Aᵢ have special structure (triangular, orthogonal, permutation, diagonal, etc.). The structure is the entire point: it turns expensive operations into cheap ones.

Two decompositions dominate introductory numerical linear algebra:

  1. LU decomposition (with pivoting):

P A = L U

  1. QR decomposition:

A = Q R

Why triangular and orthogonal factors are “easy” #

Triangular matrices make solving systems easy by substitution.

These cost about O(n²) operations, much cheaper than O(n³) elimination.

Orthogonal matrices preserve lengths and angles:

‖Q v‖ = ‖v‖, and Q⁻¹ = Qᵀ.

This is valuable for numerical stability: multiplying by Q doesn’t amplify errors the way arbitrary matrices can.

A quick comparison table #

DecompositionFormTypical useKey strengthKey limitation
LU (with pivoting)P A = L UMany solves A x = b, determinantFast repeated solvesNeeds pivoting for robustness; typically square A
QRA = Q RLeast squares, stable solvesNumerically stableUsually more work than LU for square solves

The rest of this lesson builds these ideas slowly: first LU as “organized Gaussian elimination,” then QR as “orthogonal elimination.”

Core Mechanic 1: LU Decomposition (Gaussian Elimination as a Factorization) #

Why LU is the natural next step after Gaussian elimination #

You already know Gaussian elimination: use row operations to turn A into an upper-triangular matrix, then solve by back substitution.

LU decomposition answers a natural question:

Can we save the elimination work so we can solve A x = b for many b without re-eliminating every time?

Yes. LU stores the elimination multipliers in L and the resulting triangular matrix in U.

The basic idea (no pivoting, conceptually) #

Suppose elimination transforms A into U.

Each elimination step subtracts a multiple of a pivot row from rows below.

Those multiples become the subdiagonal entries of L.

The result is:

A = L U

where:

How solving works once you have A = L U #

To solve A x = b:

A x = b

L U x = b

Let y = U x. Then:

  1. L y = b (forward substitution)

  2. U x = y (back substitution)

This is the payoff: after one O(n³) factorization, each new solve is ~O(n²).

Pivoting: why P is needed in practice #

In exact arithmetic, LU without pivoting can fail if a pivot is zero.

In floating-point arithmetic, it can also be disastrously inaccurate if a pivot is tiny.

Pivoting fixes this by swapping rows so that the pivot is “safe.” The most common form is partial pivoting.

We then write:

P A = L U

where P is a permutation matrix representing row swaps.

Interpreting P #

P is an identity matrix with rows permuted.

Multiplying by P on the left permutes rows:

(P A) has the same rows as A, just reordered.

In solving, it means you actually solve:

P A x = P b

L U x = P b

Determinants from LU (with pivoting) #

You already know det(A) = 0 iff A is singular. LU gives a fast route to det(A).

From P A = L U:

det(P A) = det(L U)

Use multiplicativity of determinant:

det(P) det(A) = det(L) det(U)

Now note:

So:

det(A) = det(P)⁻¹ det(L) det(U) = det(P) det(U)

because det(P)⁻¹ = det(P) (since det(P) = ±1).

Cost model intuition #

If you have many right-hand sides b₁, b₂, …, LU is a huge win.

Geometric / conceptual view #

Elimination is applying a sequence of elementary lower-triangular matrices E₁, E₂, … such that:

Eₖ … E₂ E₁ A = U

Those Eᵢ encode row operations. The inverse of a product of lower-triangular matrices is lower-triangular, so:

A = (E₁⁻¹ E₂⁻¹ … Eₖ⁻¹) U

and that product becomes L.

You don’t usually compute Eᵢ explicitly; LU is the compact record of what elimination did.

Core Mechanic 2: QR Decomposition (Orthogonality as a Tool) #

Why QR exists (a different problem than LU) #

LU shines for square systems A x = b.

But many real problems are rectangular:

So you solve a least squares problem:

minimize over x: ‖A xb‖²

QR is designed for this. It also provides a numerically stable way to solve square systems.

What QR says #

For an m×n matrix A (typically m ≥ n), QR decomposition gives:

A = Q R

where:

The key property is orthogonality:

Qᵀ Q = I

and for thin QR:

Q has columns q₁, …, qₙ with qᵢᵀ qⱼ = 0 for i ≠ j and ‖qᵢ‖ = 1.

Why orthogonality helps (stability and simplification) #

Orthogonal transformations preserve norms:

‖Q v‖ = ‖v

and they don’t magnify relative errors much. That makes QR a workhorse in numerical linear algebra.

Also, orthogonality lets you “remove” Q easily since Q⁻¹ = Qᵀ.

Least squares via QR (the key derivation) #

We want:

min x ‖A xb‖²

Substitute A = Q R:

‖A xb‖²

= ‖Q R xb‖²

Insert I = Q Qᵀ (since Q is orthogonal):

= ‖Q R x − Q Qᵀ b‖²

= ‖Q (R x − Qᵀ b)‖²

Now use norm preservation:

= ‖R x − Qᵀ b‖²

So the least squares problem reduces to minimizing a norm involving R, which is triangular.

In the common case m ≥ n with thin QR, R is n×n upper-triangular, and we solve:

R x = Qᵀ b

by back substitution (or solve the top n equations if using full QR).

How QR is computed (high-level) #

There are multiple algorithms; two important ones:

  1. Gram–Schmidt (conceptual, but can be numerically fragile)
  1. Householder reflections (standard in practice)

You don’t need to memorize the full Householder procedure yet, but you should understand the idea: QR uses orthogonal operations to make A triangular without the instability of naive elimination.

QR for square systems #

If A is square and nonsingular:

A x = b

Q R x = b

Multiply by Qᵀ:

R x = Qᵀ b

Then back substitution solves it.

Compared to LU, QR is often more stable (especially for ill-conditioned problems), but typically more expensive for square systems.

Application/Connection: Choosing LU vs QR (and What They Enable) #

The decision: what problem are you solving? #

Matrix decomposition is not one tool but a toolbox. The decomposition you choose matches the structure of the task.

TaskBest defaultWhy
Solve A x = b (square), many bLU with pivotingFast repeated solves after factorization
Compute det(A)LU with pivotingdet(A) = det(P) ∏ᵢ Uᵢᵢ
Least squares min ‖A xb‖²QROrthogonality gives stable reduction to triangular solve
Solve A x = b (square), high stability neededQROrthogonal transforms are numerically gentle

What decompositions unlock in broader CS/ML workflows #

  1. Efficient linear solves in optimization
  1. Least squares in data fitting
  1. Understanding singularity and rank behavior

A brief note on permutation matrices and practical implementations #

In code libraries, you may see variants:

The purpose is the same: reorder rows/columns to find good pivots.

A mental model you can reuse #

If you remember just that, you’ll choose correctly most of the time—and you’ll understand what the library function is doing when it returns L, U, P or Q, R.

Worked Examples (3) #

LU decomposition (no pivoting) and solving A **x** = **b** #

Let A = [[2, 1], [4, 3]] and b = [1, 2]ᵀ. Find L and U such that A = L U, then solve A x = b.

  1. We eliminate the (2,1) entry (the 4) using pivot 2.

    Multiplier m = 4/2 = 2.

  2. After elimination, row2 ← row2 − m·row1:

    row2: [4, 3] − 2·[2, 1] = [4, 3] − [4, 2] = [0, 1].

    So U = [[2, 1], [0, 1]].

  3. Form L using the multiplier m below the diagonal and 1s on the diagonal:

    L = [[1, 0], [m, 1]] = [[1, 0], [2, 1]].

  4. Check A = L U:

    L U = [[1, 0], [2, 1]] · [[2, 1], [0, 1]]

    = [

    [1·2 + 0·0, 1·1 + 0·1],

    [2·2 + 1·0, 2·1 + 1·1]

    ]

    = [[2, 1], [4, 3]] = A.

  5. Solve A x = b via L U x = b.

    Let y = U x. First solve L y = b (forward substitution):

    [[1, 0], [2, 1]] · [y₁, y₂]ᵀ = [1, 2]ᵀ.

    Equation 1: y₁ = 1.

    Equation 2: 2y₁ + y₂ = 2 ⇒ 2·1 + y₂ = 2 ⇒ y₂ = 0.

  6. Now solve U x = y (back substitution):

    [[2, 1], [0, 1]] · [x₁, x₂]ᵀ = [1, 0]ᵀ.

    Equation 2: x₂ = 0.

    Equation 1: 2x₁ + x₂ = 1 ⇒ 2x₁ = 1 ⇒ x₁ = 1/2.

Insight: LU turns one elimination into a reusable factorization: each new b only needs forward + back substitution.

LU with pivoting: when a zero pivot forces a permutation #

Let A = [[0, 1], [2, 3]]. Show that LU without pivoting fails, then find P, L, U such that P A = L U.

  1. Attempt LU without pivoting: the first pivot would be A₁₁ = 0.

    We cannot eliminate below using pivot 0 (division by zero), so LU (without pivoting) fails.

  2. Use a row swap to bring a nonzero pivot to the top.

    Swap row1 and row2. The permutation matrix is:

    P = [[0, 1], [1, 0]].

  3. Compute P A:

    P A = [[0, 1], [1, 0]] · [[0, 1], [2, 3]]

    = [[2, 3], [0, 1]].

  4. Now P A is already upper-triangular, so we can take:

    U = [[2, 3], [0, 1]].

  5. Because no elimination was needed (no subdiagonal to eliminate), L is just the identity:

    L = [[1, 0], [0, 1]].

  6. Verify:

    L U = I · U = U = P A.

Insight: Pivoting isn’t an optional detail: it makes LU exist (and behave well) by ensuring usable pivots.

QR decomposition (conceptual) and least squares reduction #

Suppose A is m×n with m ≥ n and A = Q R (thin QR, so Q is m×n with orthonormal columns and R is n×n upper-triangular). Show how min ‖A xb‖ reduces to a triangular system.

  1. Start with the least squares objective:

    min x ‖A xb‖².

  2. Substitute A = Q R:

    ‖A xb‖² = ‖Q R xb‖².

  3. Use the fact that Q has orthonormal columns. Decompose b into components in the column space of Q and orthogonal to it:

    b = Q(Qᵀ b) + (b − Q(Qᵀ b)).

    Here Q(Qᵀ b) is the projection onto Col(Q) = Col(A).

  4. Rewrite the residual:

    Q R xb

    = Q R x − Q(Qᵀ b) − (b − Q(Qᵀ b))

    = Q(R x − Qᵀ b) − (b − Q(Qᵀ b)).

  5. Now take norms. The two terms are orthogonal (one lies in Col(Q), the other in its orthogonal complement), so by the Pythagorean theorem:

    ‖Q(R x − Qᵀ b) − (b − Q(Qᵀ b))‖²

    = ‖Q(R x − Qᵀ b)‖² + ‖b − Q(Qᵀ b)‖².

  6. Use norm preservation on the first term (since Q is orthonormal on its columns):

    ‖Q(R x − Qᵀ b)‖ = ‖R x − Qᵀ b‖.

  7. The second term ‖b − Q(Qᵀ b)‖² does not depend on x, so minimizing the whole expression is equivalent to minimizing:

    ‖R x − Qᵀ b‖².

  8. Thus the least squares minimizer x solves the triangular least squares problem, and when R is nonsingular (full column rank):

    R x = Qᵀ b

    which is solved by back substitution.

Insight: QR converts “fit a vector in a subspace” into a stable triangular solve, avoiding the error amplification of normal equations.

Key Takeaways #

Common Mistakes #

Practice #

easy

Compute an LU decomposition (no pivoting) of A = [[1, 2], [3, 8]] into A = L U with L unit lower-triangular. Then solve A x = b for b = [5, 15]ᵀ using the factors.

Hint: Eliminate the 3 under the pivot 1. The multiplier is m = 3/1. Put m into L₂₁, and the resulting upper-triangular matrix is U.

Show solution

Eliminate: m = 3.

U comes from row2 ← row2 − 3·row1:

row2: [3, 8] − 3·[1, 2] = [0, 2].

So U = [[1, 2], [0, 2]].

L = [[1, 0], [3, 1]].

Solve L y = b:

y₁ = 5.

3y₁ + y₂ = 15 ⇒ 15 + y₂ = 15 ⇒ y₂ = 0.

Solve U x = y:

2x₂ = 0 ⇒ x₂ = 0.

x₁ + 2x₂ = 5 ⇒ x₁ = 5.

So x = [5, 0]ᵀ.

medium

Let P A = L U be an LU decomposition with L unit lower-triangular. Suppose U has diagonal entries (2, −1, 5) and P corresponds to exactly one row swap. Compute det(A).

Hint: Use det(P) det(A) = det(L) det(U). For unit lower-triangular L, det(L) = 1. One row swap means det(P) = −1.

Show solution

det(U) = 2 · (−1) · 5 = −10.

With one row swap, det(P) = −1.

From det(P) det(A) = det(L) det(U) = 1 · (−10) = −10,

(−1) det(A) = −10 ⇒ det(A) = 10.

hard

Assume A = Q R is a thin QR decomposition with Qᵀ Q = I and R upper-triangular and invertible. Show that the unique minimizer of min ‖A xb‖ is the solution to R x = Qᵀ b.

Hint: Start from ‖A xb‖ = ‖Q R xb‖. Add and subtract Q(Qᵀ b) and use orthogonality to split the norm into two parts, one depending on x and one not.

Show solution

We minimize f(x) = ‖A xb‖² = ‖Q R xb‖².

Write b = Q(Qᵀ b) + (b − Q(Qᵀ b)).

Then:

Q R xb = Q(R x − Qᵀ b) − (b − Q(Qᵀ b)).

The two terms are orthogonal, so:

‖Q(R x − Qᵀ b) − (b − Q(Qᵀ b))‖²

= ‖Q(R x − Qᵀ b)‖² + ‖b − Q(Qᵀ b)‖².

Because Q has orthonormal columns, ‖Q v‖ = ‖v‖ for v ∈ ℝⁿ, hence:

= ‖R x − Qᵀ b‖² + constant.

Therefore minimizing f(x) is equivalent to minimizing ‖R x − Qᵀ b‖².

Since R is invertible, the unique minimizer satisfies R x = Qᵀ b, giving x = R⁻¹ Qᵀ b.

Connections #

Systems of Linear Equations

Gaussian Elimination

Determinants

Orthogonal Matrices

Least Squares Regression

Condition Number and Numerical Stability

Quality: B (4.1/5)

← back to treebrowse all →