stochastic-gradient-descent #

Shows SGD as repeated parameter updates on a 2D loss surface using a noisy mini-batch gradient estimate g^_t. The true (full-data) gradient direction is shown alongside the stochastic estimate; mini-batch size cycles to demonstrate variance reduction and the unbiased expectation E[g^_t] = ∇f(θ_t).

canvasclick to interact

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

practical uses #

01.Training neural networks efficiently on large datasets
02.Online/streaming learning where data arrives continuously
03.Optimization when full gradients are expensive (large n), using mini-batches for a speed/variance trade-off

technical notes #

Implements a simple convex quadratic loss with an unbiased per-example gradient = true gradient + noise. g^_t is computed by averaging batchSize samples (variance ~ 1/sqrt(batchSize)), then θ is updated once per stepTime. Rendering uses snapped 4px grid alignment for a retro blocky look; animation interpolates θ between discrete updates using the provided ease(t) function.

← cooperative-games diffusion-models →