Monte Carlo Methods

←Back to Tech Tree

inventorycoverage

Monte Carlo Methods #

Probability & StatisticsDifficulty: ★★★★☆Depth: 6Unlocks: 4

Using random sampling to estimate quantities. Integration, simulation.

Interactive Visualization #

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

Core Concepts #

Key Symbols & Notation #

I_hat_N = (1/N) sum_{i=1}^N f(X_i)

Essential Relationships #

Prerequisites (2) #

Law of Large Numbers5 atomsCommon Distributions6 atoms

Unlocks (1) #

MCMClvl 4

Referenced by (2) #

Where this concept shows up in the operating-finance and personal-finance graphs.

From Business (2) #

[FIREBusiness

Monte Carlo simulation is the standard FIRE planning methodology - modeling sequence of returns risk and withdrawal success probability across thousands of simulated paths](/business/fire/)[Option PricingBusiness

Monte Carlo integration is the unifying computational method across all three perspectives - pricing path-dependent options via simulated risk-neutral paths, estimating marginal likelihoods by sampling the parameter space, and evaluating high-dimensional integrals where analytic solutions do not exist](/business/option-pricing/)

Advanced Learning Details

Graph Position #

62

Depth Cost

4

Fan-Out (ROI)

1

Bottleneck Score

6

Chain Length

Cognitive Load #

6

Atomic Elements

57

Total Elements

L4

Percentile Level

L4

Atomic Level

All Concepts (25) #

Teaching Strategy #

Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.

When an integral is too messy to compute and a system is too complex to solve, you can still estimate the answer—by simulating randomness and averaging. That’s the core promise of Monte Carlo methods: convert hard math into repeated sampling.

TL;DR:

Monte Carlo methods estimate a quantity by writing it as an expectation I = E_p[f(X)] and approximating it with a sample average Î_N = (1/N)∑_{i=1}^N f(X_i). The estimator is unbiased under mild conditions, converges by the Law of Large Numbers, and has root-mean-square error ≈ √(Var(f(X))/N). Most practical skill comes from controlling variance (and thus error) rather than “just sampling more.”

What Is Monte Carlo? (Expectation as the target) #

Monte Carlo methods are a family of techniques that use random sampling to estimate numerical quantities. They are especially useful when:

Why Monte Carlo exists (motivation) #

Many problems can be phrased as: “What is the average value of some function under some distribution?”

Examples:

The key idea is to turn the numerical goal into an expectation.

The central representation #

Let X be a random variable with distribution p (density or pmf). Let f be a function. The target quantity is:

I = E_p[f(X)]

If X is continuous with density p(x):

I = ∫ f(x) p(x) dx

If X is discrete with pmf p(x):

I = ∑_x f(x) p(x)

This seems like we replaced “compute an integral/sum” with “compute an expectation,” but expectations suggest a strategy: sample X from p and average f(X).

The Monte Carlo estimator #

Draw independent samples X₁, X₂, …, X_N ∼ p, then compute:

Î_N = (1/N) ∑_{i=1}^N f(X_i)

This is the Monte Carlo estimator (also called the sample mean estimator).

What you get “for free” #

You already know the Law of Large Numbers (LLN), so you already have the convergence story:

Î_N → I as N → ∞ (under mild conditions)

Monte Carlo is powerful because:

What you pay #

Monte Carlo is not magic; it trades algebraic difficulty for computational effort:

A useful mental model:

Core Mechanic 1: Building the estimator (sampling and averaging) #

Monte Carlo begins with a simple thought: if I = E[f(X)], then repeatedly observing f(X) and averaging should approximate I.

Why the sample average makes sense #

If X₁, …, X_N are i.i.d. from p, then each f(X_i) is an i.i.d. sample from the distribution of f(X). The expectation of f(X) is I.

So the sample mean:

Î_N = (1/N) ∑_{i=1}^N f(X_i)

is a natural plug-in estimator.

Unbiasedness (a key sanity check) #

Under the condition that E[|f(X)|] exists (finite), we can compute:

E[Î_N]

= E[(1/N) ∑_{i=1}^N f(X_i)]

= (1/N) ∑_{i=1}^N E[f(X_i)]

= (1/N) ∑_{i=1}^N E[f(X)]

= (1/N) · N · I

= I

So Î_N is unbiased.

Unbiasedness does not guarantee small error for finite N, but it does mean we are not systematically “off” in one direction.

Consistency via LLN #

By the (strong) Law of Large Numbers, if E[|f(X)|] < ∞:

Î_N → I almost surely as N → ∞

This is the convergence guarantee: with enough samples, the estimate stabilizes.

Interpreting Monte Carlo as “integration by sampling” #

Many integrals can be rewritten in expectation form. Suppose you want:

J = ∫_a^b g(x) dx

Pick a distribution over [a, b]. The most common is uniform:

X ∼ Uniform(a, b), p(x) = 1/(b−a)

Then:

E[g(X)]

= ∫_a^b g(x) · (1/(b−a)) dx

Multiply both sides by (b−a):

∫_a^b g(x) dx = (b−a) E[g(X)]

So a Monte Carlo estimator for J is:

Ĵ_N = (b−a) (1/N) ∑_{i=1}^N g(X_i), X_i ∼ Uniform(a,b)

This is the basic Monte Carlo integration recipe.

A geometric view: estimating areas/volumes #

A classic Monte Carlo use case is estimating the area of a shape S inside a bounding box B. If X is uniform over B:

P(X ∈ S) = Area(S)/Area(B)

So Area(S) = Area(B) · P(X ∈ S)

Estimate P(X ∈ S) with an average of indicator values:

Î_N = (1/N) ∑_{i=1}^N 1{X_i ∈ S}

This viewpoint is important because it shows:

Practical detail: randomness as a resource #

Computers produce pseudo-random numbers. For Monte Carlo you typically need:

Many Monte Carlo algorithms are really about constructing samples from p efficiently. In this node, we assume we can sample from p directly (later, MCMC removes this assumption).

Implementation skeleton #

A minimal Monte Carlo estimator looks like:

  1. 1)Choose p (a distribution you can sample from).
  2. 2)Define f(x) so that E_p[f(X)] is your quantity of interest.
  3. 3)Draw samples X₁, …, X_N.
  4. 4)Compute Î_N = (1/N)∑ f(X_i).
  5. 5)Quantify uncertainty (next section).

Core Mechanic 2: Error is governed by variance (1/√N and why variance reduction matters) #

Monte Carlo is simple to write down, but to use it well you must understand how the error behaves.

Why error analysis matters #

If Monte Carlo error decreased like 1/N, doubling samples would halve your error. But the typical scaling is slower: 1/√N.

That means:

Variance of the Monte Carlo estimator #

Assume X₁, …, X_N are i.i.d. from p and Var(f(X)) is finite.

Let μ = E[f(X)] = I and σ² = Var(f(X)).

Compute Var(Î_N):

Î_N = (1/N) ∑_{i=1}^N f(X_i)

Because the samples are independent:

Var(Î_N)

= Var((1/N) ∑ f(X_i))

= (1/N²) ∑ Var(f(X_i))

= (1/N²) · N · σ²

= σ² / N

So the standard deviation (standard error) is:

SD(Î_N) = σ / √N

This directly yields the typical Monte Carlo error scale.

Root-mean-square error (RMSE) #

RMSE is a clean “average size of error” measure:

RMSE(Î_N) = √(E[(Î_N − I)²])

For an unbiased estimator, RMSE equals the standard deviation:

E[Î_N] = I ⇒ RMSE(Î_N) = √(Var(Î_N)) = σ/√N

So, in the unbiased Monte Carlo setting:

RMSE ≈ √(Var(f(X))/N)

This is the key performance law.

The Central Limit Theorem (practical error bars) #

LLN tells you convergence eventually, but CLT tells you what happens at large finite N.

Under mild conditions:

√N (Î_N − I) ⇒ Normal(0, σ²)

Equivalently:

Î_N ≈ Normal(I, σ²/N)

So an approximate 95% confidence interval is:

Î_N ± 1.96 · (σ/√N)

In practice σ is unknown, so you estimate it from the samples:

s² = (1/(N−1)) ∑_{i=1}^N (f(X_i) − Î_N)²

Then use s/√N as the estimated standard error.

Why variance is the real enemy #

Notice what the formula depends on:

You can always increase N, but if Var(f(X)) is huge, you may need an impractical amount of computation.

So Monte Carlo practice is often: change how you sample or how you write the expectation so that the variance drops.

Variance reduction (preview-level) #

Here are common variance reduction ideas (you don’t need to master all here, but you should recognize them):

TechniqueCore ideaWhen it helpsTypical tradeoff
Control variatesUse a correlated quantity with known expectation to cancel noiseWhen you can find a strong correlated controlRequires analytic expectation of control
Antithetic variatesSample negatively correlated pairsSymmetric problems, monotone fNeeds special sampling design
Stratified samplingForce coverage across regionsWhen distribution has important subregionsMore bookkeeping
Importance samplingSample more from “important” regions and reweightRare events, heavy tails, sharp peaksCan explode variance if proposal is bad

The unifying idea: variance reduction improves accuracy at fixed N.

Independence assumption (and why MCMC will matter) #

The clean variance result Var(Î_N)=σ²/N relies on independence.

If samples are dependent (as in Markov chains), the estimator still often converges, but the effective sample size is lower:

Var(Î_N) ≈ (σ²_eff)/N

where σ²_eff > σ² when there is positive autocorrelation.

This is the bridge to MCMC: when direct sampling is hard, you accept dependence and then analyze it carefully.

A pacing check: what you should feel confident about now #

At this point, you should be able to:

Application/Connection: Monte Carlo for integration and simulation (and the road to MCMC) #

Monte Carlo is both a numerical integration tool and a simulation framework. The same estimator supports many workflows.

1) Monte Carlo integration in higher dimensions #

A major reason Monte Carlo is famous: high-dimensional integrals are painful for grid methods.

Suppose you want:

I = ∫_{ℝ^d} h(x) dx

If you can choose a distribution p(x) that covers the important region, rewrite:

I = ∫ h(x) dx

= ∫ (h(x)/p(x)) p(x) dx

= E_p[ h(X)/p(X) ]

Then estimate:

Î_N = (1/N) ∑_{i=1}^N h(X_i)/p(X_i), X_i ∼ p

This is the general “change of measure” trick that leads directly to importance sampling.

2) Simulation to estimate system performance #

Often you don’t have a closed-form model; you have a simulator. Monte Carlo says:

Example template:

The estimator is still Î_N.

3) Estimating probabilities as expectations #

Probabilities are expectations of indicators:

P(A) = E[1{A}]

So if you can simulate the experiment producing event A, you can estimate the probability by:

P̂_N(A) = (1/N) ∑ 1{A_i}

This becomes crucial for:

But note: for rare events, 1{A} has very small mean and may have problematic variance relative to the mean, motivating specialized methods (often importance sampling).

4) Choosing N based on desired accuracy #

Since SD(Î_N) ≈ σ/√N, to target a standard error ≤ ε:

σ/√N ≤ ε

Solve for N:

√N ≥ σ/ε

N ≥ (σ/ε)²

This is the core scaling law.

Two important practical notes:

  1. σ is unknown; estimate it from a pilot run.

  2. If f has heavy tails (variance very large or infinite), the CLT-based planning can fail; you may need robust methods or different parameterizations.

5) Connection to MCMC #

Everything so far assumed you can sample X ∼ p easily.

In Bayesian inference, p might be a posterior distribution:

p(θ | data) ∝ p(data | θ) p(θ)

Often you can evaluate p up to a constant but cannot sample from it directly.

MCMC methods construct a Markov chain whose stationary distribution is p and then approximate:

E_p[f(θ)] ≈ (1/N) ∑ f(θ_i)

So MCMC is best understood as:

If Monte Carlo is the “averaging engine,” MCMC is one of the main ways to generate the inputs to that engine.

6) A compact mental checklist #

When applying Monte Carlo, ask:

  1. What exactly is I? (Write it as E_p[f(X)].)

  2. Can I sample from p? If not, do I need a different p or MCMC?

  3. What is Var(f(X)) likely to be? Where does it get large?

  4. Do I need variance reduction to make this feasible?

  5. How will I report uncertainty (standard error / confidence interval)?

That workflow is the practical “shape” of Monte Carlo methods.

Worked Examples (3) #

Estimating an integral via Uniform sampling #

Estimate I = ∫_0^1 x² dx using Monte Carlo, and derive the estimator’s variance and RMSE in closed form.

  1. Rewrite as an expectation under Uniform(0,1):

    Let X ∼ Uniform(0,1).

    Then p(x)=1 on [0,1], so

    E[X²] = ∫_0^1 x² · 1 dx = ∫_0^1 x² dx = I.

  2. So the Monte Carlo estimator is:

    Î_N = (1/N) ∑_{i=1}^N X_i²,

    with X_i i.i.d. ∼ Uniform(0,1).

  3. Compute the true value (for reference):

    I = ∫_0^1 x² dx

    = [x³/3]_0^1

    = 1/3.

  4. Compute variance of f(X)=X².

    First compute E[X²] and E[X⁴]:

    E[X²] = 1/3.

    E[X⁴] = ∫_0^1 x⁴ dx = [x⁵/5]_0^1 = 1/5.

    So

    Var(X²) = E[X⁴] − (E[X²])²

    = 1/5 − (1/3)²

    = 1/5 − 1/9

    = (9−5)/45

    = 4/45.

  5. Compute variance of the estimator:

    Var(Î_N) = Var(X²)/N = (4/45)/N.

  6. Compute RMSE (unbiased case):

    RMSE(Î_N) = √(Var(Î_N))

    = √((4/45)/N)

    = (2/√45) · 1/√N

    = (2/(3√5)) · 1/√N.

Insight: Even in this friendly 1D integral, Monte Carlo converges like 1/√N. The constant depends on Var(f(X)); here it’s modest, so the estimator is fairly well-behaved.

Estimating π via area probability (indicator Monte Carlo) #

Use Monte Carlo to estimate π by sampling points uniformly in the square [−1,1]×[−1,1] and counting how many fall inside the unit circle.

  1. Define the geometry:

    Let B be the square [−1,1]×[−1,1].

    Area(B)=4.

    Let S be the unit disk x² + y² ≤ 1.

    Area(S)=π·1²=π.

  2. Convert area to probability:

    If (X,Y) is uniform over B, then

    P((X,Y) ∈ S) = Area(S)/Area(B) = π/4.

  3. Express probability as an expectation:

    Define f(X,Y) = 1{X²+Y² ≤ 1}.

    Then

    E[f(X,Y)] = P((X,Y) ∈ S) = π/4.

  4. Monte Carlo estimator:

    Draw i.i.d. samples (X_i, Y_i) uniform over B.

    Compute

    p̂_N = (1/N) ∑_{i=1}^N 1{X_i²+Y_i² ≤ 1}.

    Then estimate

    π̂_N = 4 p̂_N.

  5. Quantify variance:

    Each indicator is Bernoulli with success probability p = π/4.

    So

    Var(f) = p(1−p).

    Thus

    Var(p̂_N) = p(1−p)/N.

    And

    Var(π̂_N) = 16 Var(p̂_N)

    = 16 p(1−p)/N.

  6. Interpret scaling:

    Since p≈0.785, p(1−p)≈0.168.

    So SD(π̂_N) ≈ √(16·0.168/N) = √(2.688/N) ≈ 1.64/√N.

    To get ≈0.01 typical error you’d need on the order of (1.64/0.01)² ≈ 26,896 samples.

Insight: This iconic example shows both the universality and the slowness of Monte Carlo: the estimator is easy, but high precision requires many samples because error shrinks only like 1/√N.

Confidence interval for a simulation output (unknown variance) #

You simulate a random system and observe outputs f(X_i). Show how to compute a Monte Carlo estimate and an approximate 95% confidence interval using the sample variance.

  1. Run the simulation:

    Generate i.i.d. inputs X₁,…,X_N ∼ p.

    Compute outputs Y_i = f(X_i).

  2. Compute the Monte Carlo estimate:

    Î_N = (1/N) ∑_{i=1}^N Y_i.

  3. Estimate variance from data:

    Compute sample variance

    s² = (1/(N−1)) ∑_{i=1}^N (Y_i − Î_N)².

  4. Estimate standard error:

    SE ≈ s/√N.

  5. Form an approximate 95% confidence interval:

    Using CLT,

    Î_N ≈ Normal(I, σ²/N).

    Replace σ with s:

    CI₉₅% ≈ Î_N ± 1.96 · (s/√N).

    (For small N, a t-interval with N−1 degrees of freedom is often used.)

  6. Decide if you need more samples:

    If the half-width 1.96·s/√N is too large, increase N.

    Since half-width scales like 1/√N, reducing it by a factor k requires ≈k² more samples.

Insight: Monte Carlo is rarely just a point estimate. In practice you almost always want uncertainty reporting, and the sample variance gives you an operational way to do that.

Key Takeaways #

Common Mistakes #

Practice #

easy

(Integration as expectation) Use Monte Carlo to estimate J = ∫_0^2 e^{−x} dx by sampling X ∼ Uniform(0,2). Write J as (b−a)E[g(X)] and give the estimator Ĵ_N.

Hint: For X ∼ Uniform(0,2), p(x)=1/2 on [0,2]. Relate ∫ g(x) dx to E[g(X)].

Show solution

Let g(x)=e^{−x} and X ∼ Uniform(0,2).

Then

E[g(X)] = ∫_0^2 e^{−x} · (1/2) dx.

So

∫_0^2 e^{−x} dx = 2 E[e^{−X}].

Monte Carlo estimator:

Ĵ_N = 2 · (1/N)∑_{i=1}^N e^{−X_i}, with X_i i.i.d. ∼ Uniform(0,2).

easy

(Error scaling) Suppose Var(f(X)) = 9. How many i.i.d. samples N are needed so that SD(Î_N) ≤ 0.1?

Hint: Use SD(Î_N)=σ/√N with σ=√Var(f(X)).

Show solution

σ=√9=3.

We need 3/√N ≤ 0.1 ⇒ √N ≥ 30 ⇒ N ≥ 900.

medium

(Indicator variance) You estimate a probability p = P(A) using p̂_N = (1/N)∑1{A_i}. Derive Var(p̂_N) and find the worst-case (largest) variance over p ∈ [0,1].

Hint: 1{A} is Bernoulli(p). The variance of Bernoulli(p) is p(1−p). Maximize this quadratic.

Show solution

Each indicator Z_i=1{A_i} is Bernoulli(p), so Var(Z_i)=p(1−p).

Since p̂_N is the mean of i.i.d. Z_i,

Var(p̂_N)=p(1−p)/N.

The function p(1−p)=p−p² is maximized at p=1/2, giving maximum variance (1/4)/N.

Connections #

Next, extend Monte Carlo to cases where direct sampling is difficult: MCMC.

Related foundations you may want to review or connect:

Quality: A (4.4/5)

← back to treebrowse all →