positional-encoding

← ~/visualizations

positional-encoding #

Shows how position index p is mapped to a positional encoding vector PE(p) and injected into token representations (via add/concat) so parallel processing can still use order; contrasts with relative schemes that use offsets (p_i − p_j) as attention biases in an i×j attention grid.

canvasclick to interact

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

practical uses #

technical notes #

Three-panel loop (absolute → relative → integration) over ~3.6s. Left column renders token positions and vector bars for E(token), PE(p), and their combination; right column renders an attention matrix where cell intensity depends on |i−j| and a highlighted (i,j) shows Δ=p_i−p_j. All geometry is grid-snapped for a blocky aesthetic; animation is time-based using ease() and cycling indices.

← softmax-and-logitsgame-theory-introduction →