Jacobian

←Back to Tech Tree

inventorycoverage

Jacobian #

CalculusDifficulty: ★★★☆☆Depth: 6Unlocks: 14

Matrix of partial derivatives. Change of variables in integrals.

Interactive Visualization #

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

Core Concepts #

Key Symbols & Notation #

Df(x) or J_f(x) for the Jacobian matrix; det(Df(x)) or |J_f(x)| for its determinant

Essential Relationships #

Prerequisites (2) #

Gradients5 atomsMatrix Operations6 atoms

Unlocks (1) #

Matrix Calculuslvl 4

Advanced Learning Details

Graph Position #

62

Depth Cost

14

Fan-Out (ROI)

6

Bottleneck Score

6

Chain Length

Cognitive Load #

6

Atomic Elements

23

Total Elements

L0

Percentile Level

L4

Atomic Level

All Concepts (9) #

Teaching Strategy #

Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.

You already know the gradient tells you how a scalar function changes as you nudge inputs. The Jacobian is the next step: it tells you how an entire vector of outputs changes—capturing the local linear behavior of a multivariable transformation and the volume-scaling you need for change of variables in integrals.

TL;DR:

The Jacobian J_f(x) = Df(x) is the matrix of first-order partial derivatives of a vector-valued map f. It is the best linear approximation to f near a point. When f maps ℝⁿ → ℝⁿ, det(J_f) measures local oriented volume scaling; |det(J_f)| is the factor used in change-of-variables formulas in integrals.

What Is Jacobian? #

Why you need a new object beyond the gradient #

For a scalar function g: ℝⁿ → ℝ, the gradient ∇g(x) summarizes first-order change: it’s the vector that best predicts how g changes for a small input step h.

But many important maps are vector-valued:

When the output is a vector, the “rate of change” can’t be captured by a single vector. Each output component depends on each input component. The Jacobian packages all those partial derivatives into one matrix.

Definition (matrix of partial derivatives) #

Let f: ℝⁿ → ℝᵐ be written in components:

f(x) = (f₁(x), f₂(x), …, f_m(x))

where x = (x₁, x₂, …, x_n).

The Jacobian matrix of f at x is

J_f(x) = Df(x) = [ ∂f_i / ∂x_j ]

It is an m×n matrix whose (i, j) entry is:

(J_f(x))_{ij} = ∂f_i(x) / ∂x_j.

So:

Relationship to the gradient (a comforting special case) #

If m = 1 (scalar output), then J_f is 1×n:

J_f(x) = [ ∂f/∂x₁ ∂f/∂x₂ … ∂f/∂x_n ]

This is exactly the gradient as a row vector (convention). Meanwhile ∇f(x) is usually a column vector. Transpose connects them:

J_f(x) = (∇f(x))ᵀ.

So the Jacobian generalizes the gradient.

A geometric preview: local linear map #

The most important intuition is not “array of partials”, but “best linear approximation.” Near a point a, f behaves like:

f(a + h) ≈ f(a) + J_f(a) h

for small h.

That expression is the multivariable analogue of the 1D approximation:

f(a + h) ≈ f(a) + f′(a) h.

Here, f′(a) is a number; in many dimensions, the derivative becomes a matrix.

Square case and determinant #

If m = n, then J_f(x) is n×n and you can take its determinant:

det(J_f(x))

This scalar has a deep meaning:

That volume-scaling is exactly what appears in change-of-variables for integrals.

Core Mechanic 1: Jacobian as the Derivative (Linearization) #

Why linearization matters #

Most nonlinear functions are hard to analyze globally. But if you zoom in enough, smooth functions look linear. This is the core strategy behind:

The Jacobian is the device that turns “zooming in” into a concrete computation.

The best linear approximation statement #

Let f: ℝⁿ → ℝᵐ be differentiable at a. Then there exists a linear map L(h) such that

f(a + h) = f(a) + L(h) + r(h)

where the remainder satisfies

‖r(h)‖ / ‖h‖ → 0 as ‖h‖ → 0.

That linear map L is the derivative Df(a). When we choose coordinates, L is represented by the Jacobian matrix J_f(a), and we write:

f(a + h) ≈ f(a) + J_f(a) h.

Interpreting columns and rows #

Write eⱼ for the j-th standard basis vector in ℝⁿ (a 1 in position j, else 0). Then:

J_f(a) eⱼ = column j of J_f(a).

But eⱼ corresponds to “nudge only x_j”. So:

column j ≈ how the output vector changes when you increase x_j a tiny bit.

Equivalently, each row i is:

row i = [ ∂f_i/∂x₁ … ∂f_i/∂x_n ]

which is the gradient of the i-th output component (as a row). So:

Chain rule in Jacobian form (the practical payoff) #

If f: ℝⁿ → ℝᵐ and g: ℝᵐ → ℝᵏ, then the composition g ∘ f: ℝⁿ → ℝᵏ has Jacobian:

J_{g∘f}(x) = J_g(f(x)) · J_f(x).

This is the multivariable chain rule, and it looks exactly like matrix multiplication.

It’s worth pausing to connect this to “linear approximation of a composition”:

That is why the chain rule becomes matrix multiplication.

Directional derivatives via the Jacobian #

For a direction u ∈ ℝⁿ, the first-order change in f at a in direction u is:

Df(a) u = J_f(a) u.

This is the vector-valued directional derivative.

In the scalar-output case (m = 1), this reduces to:

J_f(a) u = (∇f(a))ᵀ u = ∇f(a) · u

which is the familiar directional derivative formula.

Small-error propagation (a common use) #

Suppose your input x has a small perturbation δx (measurement noise). Then the induced output perturbation is approximately:

δy ≈ J_f(x) δx.

So the Jacobian acts like a “local gain matrix.” This is the mathematical foundation for sensitivity analysis and for linearizing nonlinear systems in control and estimation.

Core Mechanic 2: Jacobian Determinant and Change of Variables #

Why determinants show up in integrals #

Integration measures “total accumulation.” In multiple dimensions, it’s accumulation over area (2D) or volume (3D and beyond). If you change coordinates, a small patch in the new coordinates may correspond to a differently sized patch in the old coordinates.

So you need a conversion factor between tiny volume elements:

(d volume in x-space) = (scale factor) · (d volume in u-space)

That scale factor is the absolute value of the Jacobian determinant.

Local volume scaling intuition #

Assume f: ℝⁿ → ℝⁿ is differentiable and J_f(a) is invertible.

Near a, f behaves like:

f(a + h) ≈ f(a) + J_f(a) h.

So near a, f is approximately the linear map h ↦ J_f(a) h. For a linear map A, the determinant det(A) gives the oriented volume scaling:

Therefore, for the nonlinear map f, the local volume scaling near a is approximately |det(J_f(a))|.

The change-of-variables formula (multivariable substitution) #

Let f: U ⊂ ℝⁿ → V ⊂ ℝⁿ be a bijective differentiable map with differentiable inverse (a diffeomorphism), and let g be integrable on V. Then:

∫∫…∫_V g(x) dx = ∫∫…∫_U g(f(u)) · |det(J_f(u))| du.

Here:

The key idea is:

dx = |det(J_f(u))| du.

2D special case: area scaling #

In 2D, if f(u, v) = (x(u, v), y(u, v)), then

J_f(u, v) = [ ∂x/∂u ∂x/∂v

∂y/∂u ∂y/∂v ]

and the area element transforms as:

dx dy = |det(J_f(u, v))| du dv.

Common coordinate transforms #

Polar coordinates #

x = r cos θ

y = r sin θ

Compute J_f(r, θ):

∂x/∂r = cos θ ∂x/∂θ = −r sin θ

∂y/∂r = sin θ ∂y/∂θ = r cos θ

So

J_f = [ cos θ −r sin θ

sin θ r cos θ ]

and

det(J_f) = (cos θ)(r cos θ) − (−r sin θ)(sin θ)

= r cos²θ + r sin²θ

= r.

Thus:

dx dy = r dr dθ.

That single factor r is exactly the Jacobian determinant’s magnitude.

General lesson #

When you see an “extra factor” like r in polar coordinates (or r² sin φ in spherical coordinates), it is not arbitrary—it is the local volume-scaling |det(J_f)|.

Orientation vs absolute value #

The determinant can be negative. Integrals measure (unsigned) volume/area, so the change-of-variables formula uses:

|det(J_f)|.

If you’re doing differential geometry or oriented integrals, the sign can matter; for standard multivariable calculus integrals over regions, absolute value is the rule.

Application/Connection: Jacobians in Optimization, ML, and Matrix Calculus #

Why Jacobians show up constantly in ML #

Many ML models are compositions of vector-valued functions:

x → f₁(x) → f₂(f₁(x)) → … → y

Training relies on derivatives of a loss with respect to parameters, and those derivatives are built from Jacobians (and their transposes) via the chain rule.

Even when you mostly hear “gradients,” under the hood:

Jacobian vs gradient vs Hessian (positioning) #

You already know gradients. The Jacobian sits between gradient and Hessian in complexity:

ObjectTypical function typeShapeCapturesNotes
∇g(x)g: ℝⁿ → ℝn×1first-order change of scalarsteepest ascent direction
J_f(x)f: ℝⁿ → ℝᵐm×nfirst-order change of vectorlinearization matrix
H_g(x)g: ℝⁿ → ℝn×nsecond-order changecurvature

A helpful mental model:

Jacobian-transpose trick (common in least squares) #

Suppose you have residuals r(x) ∈ ℝᵐ and a scalar loss:

L(x) = ½ ‖r(x)‖².

Then the gradient of L can be written using the Jacobian of r:

Let J = J_r(x) (an m×n matrix). Then:

L(x) = ½ ∑_{i=1}^m r_i(x

Differentiate component-wise. For j-th component:

∂L/∂x_j = ½ ∑_{i=1}^m 2 r_i(x) · ∂r_i/∂x_j

= ∑_{i=1}^m r_i(x) · J_{ij}

In matrix form:

∇L(x) = J_r(x)ᵀ r(x).

This identity appears in Gauss–Newton, Levenberg–Marquardt, and many optimization routines.

Jacobian in dynamics and stability #

For a dynamical system:

x_{t+1} = f(x_t)

the Jacobian J_f at a fixed point x⋆ characterizes local stability:

x_{t+1} − x⋆ ≈ J_f(x⋆) ( x_t − x⋆ )

Eigenvalues of J_f(x⋆) determine whether perturbations shrink or grow.

Bridge to Matrix Calculus #

Matrix calculus generalizes these ideas when variables and outputs are vectors/matrices and you want systematic rules.

Key bridge concepts you’ll use next:

This node sets the foundation: once “derivative = linear map = Jacobian matrix” feels natural, matrix calculus becomes mostly careful bookkeeping plus chain rule.

Worked Examples (3) #

Compute a Jacobian and use it to linearize a vector-valued function #

Let f: ℝ² → ℝ² be f(x, y) = (f₁(x, y), f₂(x, y)) where f₁(x, y) = x²y and f₂(x, y) = sin(x + y). Compute J_f(x, y). Then linearize at (1, 0) to approximate f(1.02, −0.01).

  1. Step 1: Compute partial derivatives for f₁(x, y) = x²y.

    ∂f₁/∂x = 2xy

    ∂f₁/∂y = x²

  2. Step 2: Compute partial derivatives for f₂(x, y) = sin(x + y).

    ∂f₂/∂x = cos(x + y)

    ∂f₂/∂y = cos(x + y)

  3. Step 3: Assemble the Jacobian matrix.

    J_f(x, y) = [ 2xy x²

    cos(x+y) cos(x+y) ]

  4. Step 4: Evaluate the Jacobian at (1, 0).

    J_f(1, 0) = [ 2·1·0 1²

    cos(1+0) cos(1+0) ]

    = [ 0 1

    cos 1 cos 1 ]

  5. Step 5: Compute f(1, 0).

    f(1, 0) = (1²·0, sin(1+0)) = (0, sin 1)

  6. Step 6: Form the small displacement h from (1, 0) to (1.02, −0.01).

    h = (Δx, Δy) = (0.02, −0.01)

  7. Step 7: Apply the linearization f(a+h) ≈ f(a) + J_f(a) h.

    J_f(1,0)h = [ 0 1

    cos1 cos1 ] [ 0.02

    −0.01 ]

    First component: 0·0.02 + 1·(−0.01) = −0.01

    Second component: cos1·0.02 + cos1·(−0.01) = cos1·(0.01) = 0.01 cos1

  8. Step 8: Combine.

    f(1.02, −0.01) ≈ (0, sin1) + (−0.01, 0.01 cos1)

    = (−0.01, sin1 + 0.01 cos1)

Insight: The Jacobian turns “small input change” into “approximate output change” via matrix multiplication. Notice how the first output f₁ is most sensitive to y near (1,0) (since ∂f₁/∂x = 0 there), which is immediately visible in J_f(1,0).

Derive the polar-coordinate area factor using det(J) #

Use the transformation f(r, θ) = (x, y) = (r cos θ, r sin θ). Compute det(J_f) and show that dx dy = r dr dθ.

  1. Step 1: Write the Jacobian matrix.

    J_f(r, θ) = [ ∂x/∂r ∂x/∂θ

    ∂y/∂r ∂y/∂θ ]

  2. Step 2: Compute the partial derivatives.

    ∂x/∂r = cos θ

    ∂x/∂θ = −r sin θ

    ∂y/∂r = sin θ

    ∂y/∂θ = r cos θ

  3. Step 3: Substitute into the matrix.

    J_f(r, θ) = [ cos θ −r sin θ

    sin θ r cos θ ]

  4. Step 4: Compute the determinant.

    det(J_f) = (cos θ)(r cos θ) − (−r sin θ)(sin θ)

    = r cos²θ + r sin²θ

    = r( cos²θ + sin²θ )

    = r

  5. Step 5: Convert the area element.

    dx dy = |det(J_f(r, θ))| dr dθ = |r| dr dθ

    In standard polar coordinates, r ≥ 0, so |r| = r.

    Therefore dx dy = r dr dθ.

Insight: The mysterious “extra r” in polar integrals is exactly local area scaling. A tiny rectangle of size dr×dθ in (r,θ)-space maps to a curved wedge-like region in (x,y)-space whose area is approximately r·dr·dθ.

Use the Jacobian chain rule to differentiate a composition #

Let f: ℝ² → ℝ² be f(x, y) = (u, v) = (x + y, x − y). Let g: ℝ² → ℝ² be g(u, v) = (u², uv). Compute J_{g∘f}(x, y) using the chain rule.

  1. Step 1: Compute J_f(x, y).

    u = x + y ⇒ ∂u/∂x = 1, ∂u/∂y = 1

    v = x − y ⇒ ∂v/∂x = 1, ∂v/∂y = −1

    So J_f(x, y) = [ 1 1

    1 −1 ]

  2. Step 2: Compute J_g(u, v).

    First component: g₁(u, v) = u²

    ∂g₁/∂u = 2u, ∂g₁/∂v = 0

    Second component: g₂(u, v) = uv

    ∂g₂/∂u = v, ∂g₂/∂v = u

    So J_g(u, v) = [ 2u 0

    v u ]

  3. Step 3: Apply the chain rule.

    J_{g∘f}(x, y) = J_g(f(x, y)) · J_f(x, y)

    Substitute u = x + y and v = x − y:

    J_g(f(x, y)) = [ 2(x+y) 0

    (x−y) (x+y) ]

  4. Step 4: Multiply the matrices.

    J_{g∘f} = [ 2(x+y) 0

    (x−y) (x+y) ] [ 1 1

    1 −1 ]

    Compute entry-by-entry:

    Top row:

    (1,1): 2(x+y)·1 + 0·1 = 2(x+y)

    (1,2): 2(x+y)·1 + 0·(−1) = 2(x+y)

    Bottom row:

    (2,1): (x−y)·1 + (x+y)·1 = (x−y)+(x+y)=2x

    (2,2): (x−y)·1 + (x+y)·(−1) = (x−y)−(x+y)=−2y

    So J_{g∘f}(x, y) = [ 2(x+y) 2(x+y)

    2x −2y ]

Insight: The Jacobian chain rule is “just” matrix multiplication because derivatives are linear maps. Computing J_g at (u,v) and then substituting (u,v)=f(x,y) keeps the structure clean and scales to long compositions.

Key Takeaways #

Common Mistakes #

Practice #

easy

Let f(x, y, z) = (xy, yz). Compute J_f(x, y, z). What is J_f(1, 2, 3)?

Hint: There are m=2 outputs and n=3 inputs, so J is 2×3. Differentiate each output with respect to x, y, z.

Show solution

f₁=xy ⇒ ∂f₁/∂x=y, ∂f₁/∂y=x, ∂f₁/∂z=0.

f₂=yz ⇒ ∂f₂/∂x=0, ∂f₂/∂y=z, ∂f₂/∂z=y.

So J_f(x,y,z) = [ y x 0

0 z y ].

At (1,2,3): J_f(1,2,3) = [ 2 1 0

0 3 2 ].

medium

Let f: ℝ² → ℝ² be f(u, v) = (x, y) = (u² − v², 2uv). (This maps to complex squaring.) Compute det(J_f(u, v)).

Hint: Compute ∂x/∂u, ∂x/∂v, ∂y/∂u, ∂y/∂v, then take a 2×2 determinant.

Show solution

x=u²−v² ⇒ ∂x/∂u=2u, ∂x/∂v=−2v.

y=2uv ⇒ ∂y/∂u=2v, ∂y/∂v=2u.

J_f(u,v) = [ 2u −2v

2v 2u ].

det(J_f) = (2u)(2u) − (−2v)(2v)

= 4u² + 4v²

= 4(u²+v²).

hard

Use a Jacobian to perform the substitution in the integral ∬_R (x + y) dx dy where R is the parallelogram defined by x = u + v, y = u − v with (u, v) ∈ [0,1]×[0,1]. Compute the value.

Hint: Compute det(J_f) for f(u,v)=(x,y). Rewrite x+y in terms of u,v. Then integrate over the unit square and multiply by |det(J_f)|.

Show solution

Define f(u,v)=(x,y) with x=u+v, y=u−v.

Jacobian:

J_f = [ ∂x/∂u ∂x/∂v

∂y/∂u ∂y/∂v ]

= [ 1 1

1 −1 ].

det(J_f) = (1)(−1) − (1)(1) = −2, so |det(J_f)|=2.

Rewrite integrand:

x+y = (u+v)+(u−v)=2u.

Change variables:

∬_R (x+y) dx dy = ∬_{[0,1]²} (2u) · 2 du dv = ∬_{[0,1]²} 4u du dv.

Compute:

∫_0^1 ∫_0^1 4u dv du = ∫_0^1 (4u·1) du = 4 ∫_0^1 u du = 4·(1/2)=2.

Connections #

Next: Matrix Calculus

Related reinforcement nodes you may have seen:

Forward links this enables:

Quality: A (4.5/5)

← back to treebrowse all →