Measure Theory

←Back to Tech Tree

inventorycoverage

Measure Theory #

Probability & StatisticsDifficulty: ★★★★★Depth: 4Unlocks: 0

Rigorous foundation for probability. Sigma-algebras, Lebesgue integration.

Interactive Visualization #

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

Core Concepts #

Key Symbols & Notation #

mu (measure symbol, written 'mu' in ASCII)

Essential Relationships #

Prerequisites (2) #

Integrals6 atomsSets6 atoms

Advanced Learning Details

Graph Position #

46

Depth Cost

0

Fan-Out (ROI)

0

Bottleneck Score

4

Chain Length

Cognitive Load #

6

Atomic Elements

75

Total Elements

L4

Percentile Level

L4

Atomic Level

All Concepts (32) #

Teaching Strategy #

Deep-dive lesson - accessible entry point but dense material. Use worked examples and spaced repetition.

Probability looks like “assign a number to an event,” but to do that rigorously you must answer: which subsets count as events, and how do we assign probabilities consistently when there are infinitely many ways to combine events? Measure theory is the framework that makes those questions precise—and it also replaces “area under a curve” with a more flexible notion of integration that works for discontinuous functions, limits of random variables, and continuous distributions.

TL;DR:

Measure theory builds probability from three pieces: (1) a σ-algebra of measurable sets (the events you’re allowed to talk about), (2) a measure μ that assigns sizes to those sets and is countably additive, and (3) the Lebesgue integral ∫ f dμ, defined by approximating a function using simple functions and taking limits. This foundation explains why expectations behave well under limits (monotone/dominated convergence), and why densities are “Radon–Nikodym derivatives.”

What Is Measure Theory? #

Why you need more than “area” and “length” #

In calculus you learned integrals as areas under curves, often via Riemann sums. That’s powerful, but it has friction points:

Measure theory addresses these by separating two roles that are blended in Riemann integration:

  1. Which subsets are “measurable”? (Which events can we assign sizes/probabilities to?)

  2. How do we assign a size μ to those subsets?

Then integration becomes: sum the values of a function weighted by μ, generalized via limits.

The central objects: (Ω, 𝔽, μ) #

A measure-theoretic universe is a triple:

In probability, μ is usually written P and has P(Ω) = 1. In analysis, μ could be Lebesgue measure on ℝ (length), or counting measure, or many others.

What measure theory gives you conceptually #

Measure theory is not just a bag of definitions. It gives a consistent logic for:

These “countable” properties are exactly what probability needs, because sequences of random variables and sequences of events are the bread and butter of convergence and laws of large numbers.

A quick roadmap #

We’ll build up in layers:

  1. 1)σ-algebras: what counts as a measurable set
  2. 2)Measures: how to assign sizes to those sets
  3. 3)Measurable functions: random variables as measurable maps
  4. 4)Lebesgue integral: define ∫ f dμ via simple functions and limits
  5. 5)Convergence theorems: why expectations commute with limits under conditions
  6. 6)Probability connections: densities, Radon–Nikodym, and expectation as an integral

We’ll keep connecting back to probability intuition: “events,” “probabilities,” “expectations.”

Core Mechanic 1: σ-algebras (What counts as an event?) #

Why σ-algebras exist #

Suppose Ω is uncountably infinite (e.g., Ω = ℝ). You might hope to assign a probability to every subset of Ω.

But there is a deep obstruction: there exist subsets of ℝ (e.g., Vitali sets) for which no translation-invariant “length” can be consistently defined while preserving natural properties like countable additivity. So we compromise: we choose a rich, well-behaved family of subsets—enough to include all sets we care about in applications—on which a measure can live.

That family is a σ-algebra.

Definition #

A collection 𝔽 ⊂ 𝒫(Ω) (a set of subsets of Ω) is a σ-algebra if:

  1. Ω ∈ 𝔽

  2. If A ∈ 𝔽 then Aᶜ ∈ 𝔽

  3. If A₁, A₂, … ∈ 𝔽 then ⋃ₙ Aₙ ∈ 𝔽 (closure under countable unions)

From these, you automatically also get closure under:

Why “countable” matters #

Finite unions are not enough. Many limit operations produce countable unions/intersections:

If your “events” weren’t closed under these, you couldn’t even state many convergence results.

Examples of σ-algebras #

1) Trivial σ-algebra #

𝔽 = {∅, Ω}. Too small for most purposes.

2) Power set #

𝔽 = 𝒫(Ω). Always a σ-algebra. For uncountable Ω, many interesting measures cannot be defined on all subsets while keeping desired properties.

3) Borel σ-algebra on ℝ #

The Borel σ-algebra 𝔹(ℝ) is generated by open sets (equivalently intervals). Intuitively, it contains sets you can build from intervals using countable unions/intersections/complements.

Notation: 𝔹(ℝ) = σ(open sets).

This is the standard measurable structure for real-valued random variables.

Generated σ-algebras (σ(𝒢)) #

Often you don’t want to specify 𝔽 directly. Instead, you start with “basic” sets 𝒢 (like intervals) and take the smallest σ-algebra containing them.

Formally:

This intersection is again a σ-algebra (intersection of σ-algebras preserves closure properties).

Why this matters in probability:

A useful comparison table #

ObjectWhat it isClosed underWhy it’s used
Algebra (field) of setscollection of subsetscomplements + finite unionstoo weak for limit operations
σ-algebracollection of subsetscomplements + countable unionssupports measure + convergence
Topologycollection of “open” setsarbitrary unions + finite intersectionscontinuity, not size
Borel σ-algebra 𝔹(ℝ)σ-algebra generated by open setscountable unions/intersections/complementsstandard measurable sets on ℝ

Breathing room: what σ-algebras do not do #

A σ-algebra is not a measure. It does not assign sizes. It only defines which questions (events) are legal.

Think of 𝔽 as the “language” of measurable events; μ will be the “semantics” that assigns numbers.

Core Mechanic 2: Measures (Assigning size μ to measurable sets) #

Why measures are defined on σ-algebras #

Once you commit to a σ-algebra 𝔽, you want μ(A) to behave like “size.” The critical property is that disjoint pieces add up—even if you have countably many of them.

Definition #

A measure is a function μ: 𝔽 → [0, ∞] such that:

  1. μ(∅) = 0

  2. (Countable additivity) If A₁, A₂, … are disjoint sets in 𝔽, then

μ(⋃ₙ Aₙ) = ∑ₙ μ(Aₙ)

A triple (Ω, 𝔽, μ) is a measure space.

Immediate consequences you’ll use constantly #

Let A, B ∈ 𝔽.

  1. Monotonicity: If A ⊂ B then μ(A) ≤ μ(B)

Proof idea:

  1. Finite additivity (a special case of countable additivity)

  2. Subadditivity: μ(⋃ₙ Aₙ) ≤ ∑ₙ μ(Aₙ)

This holds even without disjointness; you can prove it by converting to disjoint pieces.

Continuity of measure (limits of sets) #

This is one of the big reasons countable additivity matters.

Continuity from below #

If A₁ ⊂ A₂ ⊂ … (increasing sequence) and A = ⋃ₙ Aₙ, then

μ(A) = limₙ→∞ μ(Aₙ)

Sketch:

Continuity from above #

If A₁ ⊃ A₂ ⊃ … (decreasing) and μ(A₁) < ∞ and A = ⋂ₙ Aₙ, then

μ(A) = limₙ→∞ μ(Aₙ)

The finiteness condition matters; without it the statement can fail (∞ − ∞ ambiguities).

Examples of measures #

1) Counting measure #

Let Ω be any set, 𝔽 = 𝒫(Ω). Define

μ(A) = number of elements in A (possibly ∞).

This makes ∫ f dμ become a sum.

2) Lebesgue measure on ℝ #

This is the rigorous notion of “length.” For intervals,

μ((a, b)) = b − a

and it extends to a huge σ-algebra (Lebesgue measurable sets) containing all Borel sets.

Lebesgue measure is the backbone of continuous probability and analysis.

3) Probability measures #

A probability space is a measure space (Ω, 𝔽, P) with P(Ω) = 1.

So probability theory is measure theory plus the normalization P(Ω)=1.

Complete measures and why null sets matter #

A set N ∈ 𝔽 is a null set if μ(N) = 0.

A measure space is complete if every subset of a null set is measurable (i.e., is in 𝔽). This is desirable because “events of probability zero” should not create measurability paradoxes when you take subsets.

Lebesgue measure is complete on the Lebesgue σ-algebra, while the Borel measure on 𝔹(ℝ) is not complete (there are subsets of Borel-null sets that are not Borel).

Breathing room: how measures differ from intuitions #

It’s tempting to think every set has a “size.” Measure theory explicitly rejects that on ℝ.

Instead you pick:

This is the trade: not every subset is measurable, but everything measurable behaves beautifully.

Core Mechanic 3: Measurable functions and the Lebesgue integral #

Why integration needs a new definition #

Riemann integration partitions the x-axis, then samples f(x). Lebesgue integration flips the perspective:

This is a major advantage when:

Measurable functions #

Given measurable spaces (Ω, 𝔽) and (S, 𝒮), a function f: Ω → S is measurable if

∀B ∈ 𝒮, f⁻¹(B) ∈ 𝔽.

In the common case S = ℝ with 𝒮 = 𝔹(ℝ):

In probability:

Simple functions (the building blocks) #

A simple function φ is a measurable function that takes only finitely many values:

φ = ∑_{k=1}^m a_k 1_{A_k}

where a_k ≥ 0 (often start with nonnegative case) and A_k ∈ 𝔽 are measurable.

Intuition:

Defining the Lebesgue integral for nonnegative functions #

We proceed in layers.

Step 1: integral of an indicator #

For A ∈ 𝔽,

∫ 1_A dμ = μ(A)

This aligns perfectly with probability: E[1_A] = P(A).

Step 2: integral of a nonnegative simple function #

If φ = ∑_{k=1}^m a_k 1_{A_k}, define

∫ φ dμ = ∑_{k=1}^m a_k μ(A_k)

You can check this is well-defined (independent of representation) by refining partitions.

Step 3: integral of a general nonnegative measurable function #

For f: Ω → [0, ∞] measurable,

∫ f dμ = sup{ ∫ φ dμ : 0 ≤ φ ≤ f, φ simple }

This definition is motivated by approximation:

Concretely, you can build φ_n by quantizing values of f into dyadic bins.

Extending to integrable real-valued functions #

For measurable f that can be positive or negative, define positive/negative parts:

f⁺ = max(f, 0),

f⁻ = max(−f, 0),

so f = f⁺ − f⁻ and |f| = f⁺ + f⁻.

Then define:

∫ f dμ = ∫ f⁺ dμ − ∫ f⁻ dμ

provided at least one of ∫ f⁺ dμ, ∫ f⁻ dμ is finite (and for integrable functions we require ∫ |f| dμ < ∞).

Why this definition is powerful #

It bakes in limit behavior.

If f_n ↑ f (pointwise increasing), then the integrals converge:

∫ f_n dμ ↑ ∫ f dμ

This is not an extra theorem; it is tightly tied to the “sup of simple functions” construction.

Key convergence theorems (stated carefully) #

These are the tools you will constantly use in probability and ML theory.

Monotone Convergence Theorem (MCT) #

If 0 ≤ f₁ ≤ f₂ ≤ … and f_n → f pointwise, then

limₙ ∫ f_n dμ = ∫ f dμ.

Fatou’s Lemma #

For nonnegative measurable f_n,

∫ (lim infₙ f_n) dμ ≤ lim infₙ ∫ f_n dμ.

Dominated Convergence Theorem (DCT) #

If f_n → f pointwise, and there exists an integrable g with |f_n| ≤ g for all n, then

limₙ ∫ f_n dμ = ∫ f dμ.

Why DCT is a big deal in probability:

Breathing room: the philosophical shift #

Riemann integration asks: “How do we slice the domain into intervals?”

Lebesgue integration asks: “How large is the set of points where the function takes certain values?”

That shift is exactly what you want for probability:

Application/Connection: Probability, densities, and Radon–Nikodym #

Probability spaces as measure spaces #

A probability space is (Ω, 𝔽, P) with P(Ω)=1. Then expectation is just a Lebesgue integral:

E[X] = ∫_Ω X(ω) dP(ω)

and for events A ∈ 𝔽,

P(A) = ∫ 1_A dP.

This unifies discrete and continuous cases.

Discrete vs continuous: same definition, different μ #

Discrete #

If Ω is countable and P({ω}) = p(ω), then for X: Ω → ℝ,

E[X] = ∑_{ω∈Ω} X(ω) p(ω)

This is exactly ∫ X dP where P is a measure on 𝒫(Ω).

Continuous #

If Ω = ℝ and P has a density f with respect to Lebesgue measure μ (length), then

P(A) = ∫ 1_A(x) f(x) dμ(x)

and

E[X] = ∫ x f(x) dμ(x)

But the key phrase is “with respect to.” That is measure-theoretic.

Absolute continuity and why some distributions have no density #

Given two measures ν and μ on the same measurable space, we say

ν ≪ μ (ν is absolutely continuous w.r.t. μ)

if μ(A)=0 ⇒ ν(A)=0.

In probability:

Radon–Nikodym theorem (the measure-theoretic meaning of “density”) #

If ν and μ are σ-finite measures and ν ≪ μ, then there exists a measurable function f such that for all A ∈ 𝔽,

ν(A) = ∫_A f dμ.

We write f = dν/dμ, the Radon–Nikodym derivative.

In probability:

This is the rigorous replacement for “P has a pdf.”

Change of measure (a frequent ML/probability maneuver) #

If you can write

dν = f dμ,

then integrals transform as:

∫ h dν = ∫ h f dμ.

This is the backbone of importance sampling and likelihood ratios.

Conditional expectation (preview) #

Measure theory also defines conditional expectation as an L²/L¹ projection onto a sub-σ-algebra:

E[X | 𝔾]

where 𝔾 ⊂ 𝔽 is a σ-algebra representing partial information.

While a full treatment is another node, the measure-theoretic framing explains why conditioning is about σ-algebras (information) rather than only about random variables.

How this enables major probability results #

With σ-algebras, measures, and Lebesgue integration, you can state and prove:

And in ML theory:

Worked Examples (3) #

Building a σ-algebra from a partition (finite information → measurable events) #

Let Ω = {1,2,3,4}. Suppose you only observe whether the outcome is in A = {1,2} or in Aᶜ = {3,4}. Build the σ-algebra 𝔽 = σ({A}).

  1. Start with the generating family 𝒢 = {A}. We need the smallest σ-algebra containing A and Ω.

    So we must include Ω and complements.

  2. Include Ω and ∅:

    Ω ∈ 𝔽 by definition of σ-algebra.

    Then ∅ = Ωᶜ ∈ 𝔽.

  3. Include A and its complement:

    A = {1,2} ∈ 𝔽.

    Aᶜ = {3,4} ∈ 𝔽.

  4. Close under countable unions/intersections.

    But since Ω is finite, countable unions reduce to finite unions.

    The only unions you can form from {∅, A, Aᶜ, Ω} are again one of these four sets.

  5. Therefore:

    𝔽 = {∅, {1,2}, {3,4}, Ω}.

Insight: A σ-algebra can be seen as the set of all events that are decidable given a limited observation. Here, observing “in A or not” induces exactly four measurable events: impossible, certainly, A, and Aᶜ.

Measure properties: continuity from below via disjoint increments #

Let (Ω, 𝔽, μ) be a measure space. Let A₁ ⊂ A₂ ⊂ … and define A = ⋃ₙ Aₙ. Prove μ(A) = limₙ μ(Aₙ).

  1. Define disjoint increments:

    B₁ = A₁

    Bₙ = Aₙ \ Aₙ₋₁ for n ≥ 2

    Then Bₙ are disjoint, and each Bₙ ∈ 𝔽 because σ-algebras are closed under differences.

  2. Show the union is A:

    ⋃ₙ Bₙ = A₁ ∪ ⋃_{n≥2} (Aₙ \ Aₙ₋₁) = ⋃ₙ Aₙ = A.

    (Each new piece adds exactly what was missing before.)

  3. Apply countable additivity:

    μ(A) = μ(⋃ₙ Bₙ) = ∑ₙ μ(Bₙ).

  4. Express μ(Aₙ) using the same increments:

    Aₙ = ⋃_{k=1}^n B_k (disjoint union)

    So μ(Aₙ) = ∑_{k=1}^n μ(B_k).

  5. Take the limit:

    limₙ μ(Aₙ) = limₙ ∑_{k=1}^n μ(B_k) = ∑ₙ μ(Bₙ) = μ(A).

    (The partial sums converge to the full series by definition.)

Insight: Continuity from below is really “measure respects growing approximations.” It’s the set-level analog of monotone convergence for integrals.

Lebesgue integral of a simple function (indicator decomposition) #

On Ω = [0,1] with Lebesgue measure μ, define φ(x) = 2·1_0,1/4 + 5·1_(1/4,1](x). Compute ∫ φ dμ.

  1. Identify the measurable pieces:

    A₁ = [0, 1/4], A₂ = (1/4, 1]

    These are Borel (hence Lebesgue measurable) subsets of [0,1].

  2. Compute μ(A₁) and μ(A₂):

    μ(A₁) = 1/4 − 0 = 1/4

    μ(A₂) = 1 − 1/4 = 3/4

    (Endpoints do not affect Lebesgue measure.)

  3. Use the simple function integral rule:

    ∫ φ dμ = 2·μ(A₁) + 5·μ(A₂).

  4. Plug in values:

    ∫ φ dμ = 2·(1/4) + 5·(3/4)

    = 2/4 + 15/4

    = 17/4.

Insight: For step-like functions, Lebesgue integration is literally “value × size of region.” This is the prototype for expectation: E[X] is the average value weighted by probability mass.

Key Takeaways #

Common Mistakes #

Practice #

easy

Let Ω = {1,2,3,4,5,6} (a die). Let A = {1,3,5} (odd outcomes). Write out the σ-algebra σ({A}) explicitly and compute P(B) for each B in that σ-algebra assuming a fair die.

Hint: A single set A generates {∅, Ω, A, Aᶜ}. Then use P(B) = |B|/6 for a fair die.

Show solution

σ({A}) = {∅, Ω, A, Aᶜ} with A = {1,3,5}, Aᶜ = {2,4,6}.

P(∅)=0, P(Ω)=1, P(A)=3/6=1/2, P(Aᶜ)=3/6=1/2.

medium

Let (Ω, 𝔽, μ) be a measure space and let A, B ∈ 𝔽 with A ⊂ B and μ(B) < ∞. Show that μ(B\A) = μ(B) − μ(A).

Hint: Write B as a disjoint union of A and (B\A), then use countable additivity (finite case).

Show solution

Since A ⊂ B, we can write B = A ∪ (B\A) and the union is disjoint.

By additivity: μ(B) = μ(A) + μ(B\A).

Because μ(B) < ∞, subtracting is well-defined, giving μ(B\A) = μ(B) − μ(A).

hard

Define f_n(x) = 1_0,n on ℝ with Lebesgue measure μ. Let f(x) = 1_[0,∞)(x). Use monotone convergence to compute limₙ ∫ f_n dμ and compare it to ∫ f dμ.

Hint: The sets [0,n] increase to [0,∞). Compute each integral as a measure of an interval, noting it may be infinite.

Show solution

We have 0 ≤ f₁ ≤ f₂ ≤ … and f_n(x) ↑ f(x) pointwise. By MCT,

limₙ ∫ f_n dμ = ∫ f dμ.

Compute ∫ f_n dμ = μ([0,n]) = n.

Thus limₙ ∫ f_n dμ = limₙ n = ∞.

Also ∫ f dμ = μ([0,∞)) = ∞.

So both sides match (both infinite), illustrating that MCT allows infinite integrals naturally.

Connections #

Next nodes you can unlock or connect:

Quality: B (4.0/5)

← back to treebrowse all →