← ~/visualizations
positional-encoding #
Shows how position index p is mapped to a positional encoding vector PE(p) and injected into token representations (via add/concat) so parallel processing can still use order; contrasts with relative schemes that use offsets (p_i − p_j) as attention biases in an i×j attention grid.
canvasclick to interact
⏮◀◀▶▶STEP0.25x1xZOOM
t=0s
practical uses #
- 01.Enable Transformers to represent word order without recurrence/convolutions
- 02.Improve long-context generalization (e.g., sinusoidal/rotary or other structured encodings)
- 03.Model relative relationships like distance-based attention bias for tasks such as retrieval, code, and time series
technical notes #
Three-panel loop (absolute → relative → integration) over ~3.6s. Left column renders token positions and vector bars for E(token), PE(p), and their combination; right column renders an attention matrix where cell intensity depends on |i−j| and a highlighted (i,j) shows Δ=p_i−p_j. All geometry is grid-snapped for a blocky aesthetic; animation is time-based using ease() and cycling indices.
← softmax-and-logitsgame-theory-introduction →