← ~/visualizations
layer-normalization #
Shows a batch of activation vectors (rows) being normalized per-sample across feature dimensions: compute μ and σ² for each row, normalize z=(x-μ)/√(σ²+ε), then apply learned per-feature affine parameters γ and β to produce the output y. Animation highlights one sample at a time to emphasize that statistics are not computed across the batch.
canvasclick to interact
⏮◀◀▶▶STEP0.25x1xZOOM
t=0s
practical uses #
- 01.Stabilizing training in Transformers/RNNs where batch statistics are inconvenient
- 02.Improving optimization by keeping activations well-scaled per example
- 03.Retaining representational flexibility via learned γ (scale) and β (shift) after normalization
technical notes #
Pure Canvas2D, blocky grid cells with snapped coordinates. 4-stage loop (input → stats → normalize → affine) over ~4.2s; active sample and feature indices cycle independently. Per-sample μ/σ² bars and per-feature γ/β panel reinforce the axes of normalization vs learned affine parameters.
← convex-optimizationauction-theory →