Convolution Operation

←Back to Tech Tree

inventorycoverage

Convolution Operation #

Linear AlgebraDifficulty: ★★★☆☆Depth: 0Unlocks: 4

The sliding-window dot-product between a kernel and input (1D/2D) used to extract local patterns and build translation-equivariant representations in convolutional networks. Grasping how stride, padding, and kernel size affect output shape and receptive field is key for CNN design.

Interactive Visualization #

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

Core Concepts #

Key Symbols & Notation #

asterisk symbol for convolution ("*")

Essential Relationships #

Unlocks (1) #

Deep Learninglvl 5

Advanced Learning Details

Graph Position #

6

Depth Cost

4

Fan-Out (ROI)

2

Bottleneck Score

0

Chain Length

Cognitive Load #

6

Atomic Elements

46

Total Elements

L3

Percentile Level

L4

Atomic Level

All Concepts (18) #

Teaching Strategy #

Deep-dive lesson - accessible entry point but dense material. Use worked examples and spaced repetition.

Convolution is the workhorse operation behind CNNs: a small set of weights (a kernel) slides across an input and produces a new signal/image whose values tell you “how much this local pattern appears here.” The power comes from doing the same computation everywhere (weight sharing), which naturally builds translation-equivariant representations.

TL;DR:

A (discrete) convolution layer computes a sliding-window dot product between a kernel and local patches of the input. Stride, padding, and dilation control where the kernel is applied and therefore determine output shape and receptive field. In deep learning libraries, what’s called “convolution” is usually cross-correlation (no kernel flip), but the shape math and intuition are the same.

Prerequisites and Conventions (read this to avoid implementation confusion) #

This node is designed to be foundational, but a few micro-prereqs and conventions will save you from common shape bugs.

Micro-prerequisites #

You should be comfortable with:

  1. Indices and summations
  1. Dot product

x⋅w=∑i=0k−1xiwi\mathbf{x} \cdot \mathbf{w} = \sum_{i=0}^{k-1} x_i w_ix⋅w=i=0∑k−1​xi​wi​

  1. Arrays / tensors and shapes
  1. Basic CNN layer conventions

Critical convention: convolution vs cross-correlation #

In classical signal processing, discrete convolution flips the kernel. In most deep learning libraries, the operation named conv is actually cross-correlation (no flip). For 1D:

y[n]=∑i=0k−1w[i] x[n+i]y[n] = \sum_{i=0}^{k-1} w[i],x[n + i]y[n]=i=0∑k−1​w[i]x[n+i]

y[n]=∑i=0k−1w[i] x[n−i]y[n] = \sum_{i=0}^{k-1} w[i],x[n - i]y[n]=i=0∑k−1​w[i]x[n−i]

For learning CNNs, the “pattern detector sliding” intuition works either way; the network can learn flipped weights if needed. What matters most in practice is shape math and which positions are included.

Channel order: NCHW vs NHWC #

Different frameworks store tensors differently:

NameMeaningCommon in
NCHW[batch, channels, height, width]PyTorch (default), many CUDA kernels
NHWC[batch, height, width, channels]TensorFlow (often), some accelerators

The convolution math is identical; only the dimension ordering changes.

“Same” padding is not one universal rule #

Libraries define padding="same" to mean “output spatial size is approximately input size when stride=1,” but when stride>1 there are multiple choices (rounding up vs down, distributing extra pad left/right). When you care about exact sizes, use the explicit formula with (p_left, p_right) rather than relying on a name.

Dilation exists (we’ll define it later) #

Dilation spaces out kernel taps. It changes the effective kernel size and receptive field without increasing parameter count.

Keep these in mind; they directly address the most common beginner failure mode: implementing a conv layer and getting an output whose shape doesn’t match your expectation.

What Is Convolution Operation? #

Why this operation exists #

Many data types have a strong notion of locality:

A fully connected layer mixes everything with everything, which is expensive and ignores locality. Convolution instead says:

  1. Look at a small local patch.

  2. Compute a dot product with a kernel (a learned pattern).

  3. Slide that kernel across positions, reusing the same weights.

This reuse is called weight sharing. It gives you an inductive bias: if a feature (like a vertical edge) matters at one location, it likely matters at other locations too.

Core definition (1D intuition first) #

Let x be a 1D input of length L and w be a kernel of length k. We form an output y where each y[n] is a weighted sum of a window of x.

A common deep-learning definition (cross-correlation) is:

y[n]=∑i=0k−1w[i] x[n+i]y[n] = \sum_{i=0}^{k-1} w[i] , x[n + i]y[n]=i=0∑k−1​w[i]x[n+i]

Interpretation:

So each y[n] is:

y[n]=w⋅xpatch at ny[n] = \mathbf{w} \cdot \mathbf{x}_{\text{patch at }n}y[n]=w⋅xpatch at n​

Extending to 2D (images) #

For an image x with height H and width W, and a 2D kernel w of size k_h × k_w, the output at (u, v) is:

y[u,v]=∑i=0kh−1∑j=0kw−1w[i,j] x[u+i,v+j]y[u,v] = \sum_{i=0}^{k_h-1} \sum_{j=0}^{k_w-1} w[i,j] , x[u+i, v+j]y[u,v]=i=0∑kh​−1​j=0∑kw​−1​w[i,j]x[u+i,v+j]

Again, it’s a dot product between a flattened patch and the flattened kernel.

Multi-channel convolution (what CNNs actually use) #

Images typically have channels (RGB), and intermediate CNN layers have many channels. A convolution kernel spans all input channels.

If input x has C_in channels and kernel has C_in channels too, then:

y[u,v]=∑c=0Cin−1∑i=0kh−1∑j=0kw−1w[c,i,j] x[c,u+i,v+j]y[u,v] = \sum_{c=0}^{C_{in}-1} \sum_{i=0}^{k_h-1} \sum_{j=0}^{k_w-1} w[c,i,j] , x[c, u+i, v+j]y[u,v]=c=0∑Cin​−1​i=0∑kh​−1​j=0∑kw​−1​w[c,i,j]x[c,u+i,v+j]

And if you want C_out output channels, you learn C_out different kernels, one per output channel. The full weight tensor is shaped:

Two big ideas embedded in the definition #

  1. Local receptive field: y[u,v] depends only on a local neighborhood of x.

  2. Translation equivariance: if you shift the input, the output shifts correspondingly (ignoring boundary effects from padding). This comes from applying the same kernel at every location.

Equivariance is not the same as invariance:

Convolution gives you the equivariant building block that deeper architectures can compose into higher-level behavior.

Core Mechanic 1: Sliding-Window Dot Product (stride, padding, dilation, output shape) #

This section is about the “physics” of convolution in code: where the kernel lands, how many outputs you get, and what stride/padding/dilation actually do.

1) Stride: how far you move the window #

With stride s (1D), you compute outputs at positions n = 0, s, 2s, … rather than every position.

For 2D, you have stride (s_h, s_w). The kernel top-left corner moves by s_h rows and s_w columns.

Effect: larger stride → smaller output spatial size → more downsampling.

2) Padding: what happens at the boundaries #

Without padding, you can only place the kernel where it fully fits in the input. That shrinks the output.

Padding adds extra values around the border, commonly zeros.

Effect: more padding → larger output spatial size and more border coverage.

Boundary note: padding breaks perfect translation equivariance at the edges because the border now “sees” artificial values.

3) Dilation: spacing out kernel taps #

Dilation d inserts gaps between kernel elements. In 1D, a kernel with k elements and dilation d covers an effective size:

keff=d (k−1)+1k_{eff} = d,(k-1) + 1keff​=d(k−1)+1

Example: k=3, d=2 covers positions like [0, 2, 4] → k_eff=5.

In 2D, apply this per dimension:

Effect: increases receptive field without increasing parameter count, but can “skip over” fine details.

4) Output shape formula (1D) #

Let:

Compute effective kernel size:

keff=d (k−1)+1k_{eff} = d,(k-1) + 1keff​=d(k−1)+1

Then output length L_out is:

Lout=⌊L+p−keffs⌋+1L_{out} = \left\lfloor \frac{L + p - k_{eff}}{s} \right\rfloor + 1Lout​=⌊sL+p−keff​​⌋+1

Why the floor? Because you only count kernel placements that fit.

Quick derivation (showing the logic) #

The last valid output position corresponds to the last kernel placement whose rightmost sampled index is within the padded input.

So:

n≤L+p−keffn \le L + p - k_{eff}n≤L+p−keff​

With stride s, starts are n = 0, s, 2s, …, m s. Largest m such that m s ≤ L + p - k_eff is:

m=⌊L+p−keffs⌋m = \left\lfloor \frac{L + p - k_{eff}}{s} \right\rfloorm=⌊sL+p−keff​​⌋

And the number of outputs is m+1:

Lout=m+1=⌊L+p−keffs⌋+1L_{out} = m+1 = \left\lfloor \frac{L + p - k_{eff}}{s} \right\rfloor + 1Lout​=m+1=⌊sL+p−keff​​⌋+1

5) Output shape formula (2D) #

For input H×W, kernel k_h×k_w, dilation d_h,d_w, stride s_h,s_w, and padding totals p_h = p_top+p_bottom, p_w = p_left+p_right:

Hout=⌊H+ph−kh,effsh⌋+1H_{out} = \left\lfloor \frac{H + p_h - k_{h,eff}}{s_h} \right\rfloor + 1Hout​=⌊sh​H+ph​−kh,eff​​⌋+1

Wout=⌊W+pw−kw,effsw⌋+1W_{out} = \left\lfloor \frac{W + p_w - k_{w,eff}}{s_w} \right\rfloor + 1Wout​=⌊sw​W+pw​−kw,eff​​⌋+1

where

kh,eff=dh(kh−1)+1,kw,eff=dw(kw−1)+1k_{h,eff} = d_h (k_h-1)+1, \quad k_{w,eff} = d_w (k_w-1)+1kh,eff​=dh​(kh​−1)+1,kw,eff​=dw​(kw​−1)+1

6) “Valid” vs “Same” vs explicit padding #

It helps to name common padding schemes, but always ground them in the formula.

SchemeTypical meaningOutcome (stride=1, dilation=1)
validp=0output shrinks: L_out = L-k+1
samechoose p so L_out = Lpreserves length/size

For stride=1, dilation=1, to get L_out = L you need:

L=L+p−k1+1⇒p=k−1L = \frac{L + p - k}{1} + 1 \Rightarrow p = k-1L=1L+p−k​+1⇒p=k−1

That means total padding p = k-1. Usually you split it as evenly as possible:

For 2D, do this per dimension.

When stride>1, there is not a single padding that guarantees H_out = H exactly; libraries choose rules like “output is ceil(H/stride).” If you need exactness, compute padding explicitly.

7) Parameter count and compute cost (sanity checks) #

For a standard 2D conv with C_in input channels and C_out output channels:

Compute is roughly proportional to:

These simple counts help you reason about model size and performance.

Core Mechanic 2: Weight Sharing, Translation Equivariance, and Receptive Field Growth #

Now that you can compute shapes, the next step is understanding why convolutions are such a good building block.

1) Kernel as a local pattern detector #

Think of a kernel as encoding a template.

At each position, the dot product is high when the local patch aligns with the kernel’s pattern.

Dot product = similarity (with caveats) #

If you normalize vectors, the dot product relates to cosine similarity. CNNs don’t usually normalize patches, so magnitude also matters. But the intuition remains: convolution measures how much a pattern is present locally.

2) Weight sharing → translation equivariance #

Weight sharing means the same weights w are used for every location. This creates a structured linear map.

Let T_Δ be a shift operator that shifts the input by Δ (in 1D, Δ is an integer). For interior positions (ignoring boundary padding effects), convolution satisfies:

Conv(TΔx)=TΔ(Conv(x))\text{Conv}(T_\Delta x) = T_\Delta (\text{Conv}(x))Conv(TΔ​x)=TΔ​(Conv(x))

That is equivariance.

Why you care:

Padding caveat:

3) Stacking convolutions expands the receptive field #

The receptive field of an output unit is the region of the input that can affect it.

For a single conv layer with kernel size k and dilation 1:

But when you stack layers, receptive fields grow.

1D receptive field with stride 1 (simple case) #

Suppose you apply two conv layers, both with kernel size k=3, stride=1, dilation=1.

Layer 1:

Layer 2:

Substitute dependencies:

So y₂[n] depends on x[n..n+4], i.e. 5 input positions.

This shows a pattern: with stride=1 and dilation=1, stacking k=3 layers increases receptive field by 2 each layer.

General receptive field growth (useful rule of thumb) #

For 1D layers ℓ=1..L, with kernel sizes k_ℓ and strides s_ℓ (ignore dilation for a moment), define:

Update per layer:

jℓ=jℓ−1⋅sℓj_\ell = j_{\ell-1} \cdot s_\elljℓ​=jℓ−1​⋅sℓ​

rℓ=rℓ−1+(kℓ−1)⋅jℓ−1r_\ell = r_{\ell-1} + (k_\ell - 1) \cdot j_{\ell-1}rℓ​=rℓ−1​+(kℓ​−1)⋅jℓ−1​

This is a standard CNN design tool: stride increases the jump (downsampling), and kernel size increases the field.

If you include dilation d_ℓ, replace (k_ℓ − 1) with (k_{eff,ℓ} − 1) where k_eff = d(k−1)+1.

4) Multiple kernels = multiple learned features #

A conv layer usually has many output channels. Each output channel corresponds to a different kernel (feature detector).

So at each spatial location, you don’t just output one number; you output a vector of C_out numbers. You can think of that vector as a learned local descriptor.

5) Convolution is linear (per channel) but CNNs are not #

Convolution itself is a linear operation in x:

Conv(ax1+bx2)=a Conv(x1)+b Conv(x2)\text{Conv}(a x_1 + b x_2) = a,\text{Conv}(x_1) + b,\text{Conv}(x_2)Conv(ax1​+bx2​)=aConv(x1​)+bConv(x2​)

CNNs become powerful because you compose:

The conv operation is the structured linear piece that respects locality and translation structure.

Application / Connection: How Convolution Is Used in CNN Design (and how libraries parameterize it) #

This section connects the operation to practical CNN design decisions and implementation details.

1) Designing for shapes: a workflow #

When you design a CNN block, you typically decide:

  1. Desired change in spatial size (keep, half, quarter)

  2. Number of channels (C_in → C_out)

  3. Kernel size (locality) and whether to use dilation

  4. Padding to control boundaries

A common pattern:

Example: “same-size” 3×3 conv (stride 1) #

For k=3, dilation=1, want H_out = H.

2) Dilation vs larger kernels #

If you want a larger receptive field, you can:

Trade-offs:

MethodProsCons
Larger kernelDirect, dense coverageMore parameters and compute
Stack 3×3Often efficient; more nonlinearitiesDeeper network, may be harder to optimize
DilationBig receptive field, same paramsCan miss fine detail; gridding artifacts

3) Grouped and depthwise convolution (heads-up) #

Even though this node focuses on standard convolution, you’ll see these variants:

These change parameter count and mixing across channels.

4) Library parameters you must map correctly #

When you implement Conv2d / tf.nn.conv2d, you must align:

  1. Tensor layout
  1. Weight layout
  1. Padding semantics

and pads accordingly.

  1. Dilation parameter

5) The “im2col” mental model (optional but clarifying) #

A convolution can be implemented by:

  1. Extracting all patches into a big matrix (each row = one patch) → im2col

  2. Flattening kernels into columns

  3. Doing a matrix multiply

This shows convolution is a structured linear operator. It also explains why:

You don’t need to implement im2col, but knowing it helps debug shapes: patches correspond to output positions.

6) Where this connects next #

Once you’re fluent with convolution mechanics, you can understand:

This node is the gateway from basic linear algebra (dot products) to deep learning feature extraction.

Worked Examples (3) #

Worked Example 1 (1D): Compute convolution output values and output length #

Input x (length L=6): x = [1, 2, 0, -1, 3, 1]

Kernel w (length k=3): w = [2, -1, 1]

Use cross-correlation convention (no flip), stride s=1, padding p=0, dilation d=1.

  1. List the patches of length 3 (since k=3) that fit with p=0:

    • •patch at n=0: [x0, x1, x2] = [1, 2, 0]
    • •patch at n=1: [x1, x2, x3] = [2, 0, -1]
    • •patch at n=2: [x2, x3, x4] = [0, -1, 3]
    • •patch at n=3: [x3, x4, x5] = [-1, 3, 1]
  2. Compute output length using the formula:

    Effective kernel size: k_eff = d(k-1)+1 = 1·(3-1)+1 = 3

    Total padding p = 0

    Lout=⌊L+p−keffs⌋+1=⌊6+0−31⌋+1=3+1=4L_{out} = \left\lfloor \frac{L + p - k_{eff}}{s} \right\rfloor + 1 = \left\lfloor \frac{6 + 0 - 3}{1} \right\rfloor + 1 = 3 + 1 = 4Lout​=⌊sL+p−keff​​⌋+1=⌊16+0−3​⌋+1=3+1=4

    So y has length 4 (indices 0..3).

  3. Compute each y[n] as a dot product y[n] = ∑_{i=0}^{2} w[i] x[n+i].

    n=0:

    y[0]=2⋅1+(−1)⋅2+1⋅0=2−2+0=0y[0] = 2·1 + (-1)·2 + 1·0 = 2 - 2 + 0 = 0y[0]=2⋅1+(−1)⋅2+1⋅0=2−2+0=0

  4. n=1:

    y[1]=2⋅2+(−1)⋅0+1⋅(−1)=4+0−1=3y[1] = 2·2 + (-1)·0 + 1·(-1) = 4 + 0 - 1 = 3y[1]=2⋅2+(−1)⋅0+1⋅(−1)=4+0−1=3

  5. n=2:

    y[2]=2⋅0+(−1)⋅(−1)+1⋅3=0+1+3=4y[2] = 2·0 + (-1)·(-1) + 1·3 = 0 + 1 + 3 = 4y[2]=2⋅0+(−1)⋅(−1)+1⋅3=0+1+3=4

  6. n=3:

    y[3]=2⋅(−1)+(−1)⋅3+1⋅1=−2−3+1=−4y[3] = 2·(-1) + (-1)·3 + 1·1 = -2 - 3 + 1 = -4y[3]=2⋅(−1)+(−1)⋅3+1⋅1=−2−3+1=−4

  7. Final output:

    y = [0, 3, 4, -4]

Insight: Each output is just a dot product between the kernel and a local patch. The output length comes from counting how many valid kernel placements fit; the formula matches the patch list exactly.

Worked Example 2 (2D): Output shape with stride, padding, and dilation #

Input image x has shape H×W = 7×7 (ignore channels for shape math).

Kernel size k_h×k_w = 3×3.

Stride s_h=s_w=2.

Dilation d_h=d_w=1.

Padding: p_top=p_bottom=p_left=p_right=1 (so totals p_h=2, p_w=2).

  1. Compute effective kernel sizes:

    kh,eff=dh(kh−1)+1=1⋅(3−1)+1=3k_{h,eff} = d_h (k_h-1)+1 = 1·(3-1)+1 = 3kh,eff​=dh​(kh​−1)+1=1⋅(3−1)+1=3

    kw,eff=dw(kw−1)+1=3k_{w,eff} = d_w (k_w-1)+1 = 3kw,eff​=dw​(kw​−1)+1=3

  2. Compute output height:

    Hout=⌊H+ph−kh,effsh⌋+1=⌊7+2−32⌋+1H_{out} = \left\lfloor \frac{H + p_h - k_{h,eff}}{s_h} \right\rfloor + 1 = \left\lfloor \frac{7 + 2 - 3}{2} \right\rfloor + 1Hout​=⌊sh​H+ph​−kh,eff​​⌋+1=⌊27+2−3​⌋+1

    Simplify:

    7+2-3 = 6

    Hout=⌊62⌋+1=3+1=4H_{out} = \left\lfloor \frac{6}{2} \right\rfloor + 1 = 3 + 1 = 4Hout​=⌊26​⌋+1=3+1=4

  3. Compute output width (same numbers):

    Wout=⌊W+pw−kw,effsw⌋+1=⌊7+2−32⌋+1=4W_{out} = \left\lfloor \frac{W + p_w - k_{w,eff}}{s_w} \right\rfloor + 1 = \left\lfloor \frac{7 + 2 - 3}{2} \right\rfloor + 1 = 4Wout​=⌊sw​W+pw​−kw,eff​​⌋+1=⌊27+2−3​⌋+1=4

  4. So the output spatial shape is 4×4.

    Sanity check by thinking in placements:

    • •With padding 1, the “padded input” is 9×9.
    • •A 3×3 kernel with stride 2 has top-left corners at rows 0,2,4,6 (4 positions) and same for columns → 4×4 outputs.

Insight: The shape formula is just counting how many stride-spaced kernel placements fit inside the padded input. Padding made the padded input 9×9, enabling 4 placements along each dimension with stride 2.

Worked Example 3 (Receptive field): Two stacked 3×3 conv layers #

Consider a 2D CNN block with two convolution layers, both with kernel 3×3, stride 1, dilation 1, and padding 1 (so spatial sizes are preserved). We track the receptive field size (in one dimension; it’s symmetric in H and W here).

  1. Initialize receptive field and jump:

    • •r₀ = 1 (a single input pixel)
    • •j₀ = 1 (neighboring outputs correspond to neighboring input centers)
  2. Layer 1: k=3, s=1

    j1=j0⋅s=1⋅1=1j_1 = j_0 · s = 1·1 = 1j1​=j0​⋅s=1⋅1=1

    r1=r0+(k−1)⋅j0=1+2⋅1=3r_1 = r_0 + (k-1)·j_0 = 1 + 2·1 = 3r1​=r0​+(k−1)⋅j0​=1+2⋅1=3

    After the first conv, each output pixel sees a 3-pixel-wide region (in 1D).

  3. Layer 2: k=3, s=1

    j2=j1⋅s=1j_2 = j_1 · s = 1j2​=j1​⋅s=1

    r2=r1+(k−1)⋅j1=3+2⋅1=5r_2 = r_1 + (k-1)·j_1 = 3 + 2·1 = 5r2​=r1​+(k−1)⋅j1​=3+2⋅1=5

  4. Conclusion:

    Two stacked 3×3 stride-1 conv layers yield a receptive field of 5×5 in 2D (since 5 in height and 5 in width).

Insight: Stacking small kernels grows receptive field gradually while adding nonlinearities between them—one reason repeated 3×3 convs are so common in CNNs.

Key Takeaways #

Common Mistakes #

Practice #

easy

1D shape practice: An input of length L=20 is convolved with kernel size k=5, stride s=2, dilation d=1, and total padding p=4 (e.g., 2 left + 2 right). What is L_out?

Hint: Use k_eff = d(k−1)+1 and L_out = floor((L+p−k_eff)/s)+1.

Show solution

Compute k_eff:

keff=1⋅(5−1)+1=5k_{eff} = 1·(5-1)+1 = 5keff​=1⋅(5−1)+1=5

Then:

Lout=⌊20+4−52⌋+1=⌊192⌋+1=9+1=10L_{out} = \left\lfloor \frac{20 + 4 - 5}{2} \right\rfloor + 1 = \left\lfloor \frac{19}{2} \right\rfloor + 1 = 9 + 1 = 10Lout​=⌊220+4−5​⌋+1=⌊219​⌋+1=9+1=10

So L_out = 10.

medium

2D shape + channels: You have an input tensor with shape (N=8, C_in=3, H=32, W=32) in NCHW. You apply a Conv2d with C_out=16, kernel 3×3, stride 1, dilation 1, padding 1. What is the output shape and how many weight parameters (ignore bias)?

Hint: Padding 1 with kernel 3 and stride 1 preserves H and W. Parameters are C_out·C_in·k_h·k_w.

Show solution

Spatial shape:

Hout=⌊32+2−31⌋+1=32H_{out} = \left\lfloor \frac{32 + 2 - 3}{1} \right\rfloor + 1 = 32Hout​=⌊132+2−3​⌋+1=32

Similarly W_out=32.

So output shape is (8, 16, 32, 32).

Parameter count:

16⋅3⋅3⋅3=43216 · 3 · 3 · 3 = 43216⋅3⋅3⋅3=432

So there are 432 weights (plus 16 biases if bias were included).

hard

Receptive field reasoning: In 1D, stack three conv layers with kernel sizes [3, 3, 3], strides [1, 2, 1], dilations all 1. Compute the receptive field size r₃ using the update rules j_ℓ = j_{ℓ−1}s_ℓ and r_ℓ = r_{ℓ−1} + (k_ℓ−1)j_{ℓ−1}.

Hint: Start with r₀=1, j₀=1 and apply the updates layer by layer. Be careful: stride affects j first, but r uses j_{ℓ−1}.

Show solution

Initialize:

Layer 1: k₁=3, s₁=1

j1=j0s1=1⋅1=1j_1 = j_0 s_1 = 1·1 = 1j1​=j0​s1​=1⋅1=1

r1=r0+(3−1)j0=1+2⋅1=3r_1 = r_0 + (3-1)j_0 = 1 + 2·1 = 3r1​=r0​+(3−1)j0​=1+2⋅1=3

Layer 2: k₂=3, s₂=2

j2=j1s2=1⋅2=2j_2 = j_1 s_2 = 1·2 = 2j2​=j1​s2​=1⋅2=2

r2=r1+(3−1)j1=3+2⋅1=5r_2 = r_1 + (3-1)j_1 = 3 + 2·1 = 5r2​=r1​+(3−1)j1​=3+2⋅1=5

Layer 3: k₃=3, s₃=1

j3=j2s3=2⋅1=2j_3 = j_2 s_3 = 2·1 = 2j3​=j2​s3​=2⋅1=2

r3=r2+(3−1)j2=5+2⋅2=9r_3 = r_2 + (3-1)j_2 = 5 + 2·2 = 9r3​=r2​+(3−1)j2​=5+2⋅2=9

So the final receptive field is 9 input positions wide.

Connections #

Unlocks and next steps:

Related nodes you may want next (if available in your tree):

Quality: B (4.1/5)

← back to treebrowse all →