Jacobian #

CalculusDifficulty: ★★★☆☆Depth: 6Unlocks: 14

Matrix of partial derivatives. Change of variables in integrals.

Interactive Visualization #

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

Core Concepts #

-Jacobian matrix as the array of first-order partial derivatives (entry = partial f_i / partial x_j)
-Jacobian as the derivative/linearization: the best linear approximation (matrix) of a vector-valued map at a point
-Jacobian determinant (square case): scalar giving local oriented volume scaling (absolute value used for change of variables)

Key Symbols & Notation #

Df(x) or J_f(x) for the Jacobian matrix; det(Df(x)) or |J_f(x)| for its determinant

Essential Relationships #

-Linearization: f(x+dx) ≈ f(x) + Df(x) · dx (matrix times vector)
-Change-of-variables/volume-scaling: for y=f(x) (locally invertible), infinitesimal volumes transform by |det(Df(x))|, hence integrals change using that factor

Prerequisites (2) #

Gradients5 atoms Matrix Operations6 atoms

Unlocks (1) #

Matrix Calculuslvl 4

Advanced Learning Details

Graph Position #

Depth Cost

Fan-Out (ROI)

Bottleneck Score

Chain Length

Cognitive Load #

Atomic Elements

Total Elements

Percentile Level

Atomic Level

All Concepts (9) #

- Jacobian matrix: for a function f: R^n -> R^m, the m×n matrix whose (i,j) entry is ∂f_i/∂x_j
- Jacobian determinant: determinant of a square Jacobian (when m = n)
- Jacobian as linear approximation (differential): the Jacobian gives the best linear map approximating f near a point
- Local volume (area) scaling: the Jacobian determinant measures how infinitesimal volumes are scaled by the map
- Absolute Jacobian factor in integrals: using the absolute value of the Jacobian determinant when changing variables in integrals
- Orientation information: the sign of the Jacobian determinant indicates whether the mapping preserves or reverses orientation
- Local invertibility criterion (inverse function theorem) in terms of Jacobian determinant: nonzero determinant implies local invertibility
- Rectangular (non-square) Jacobians: Jacobian matrices when m ≠ n have no determinant but still represent the linear differential
- Column/row interpretation: each column is the partial derivatives of the output(s) with respect to one input variable (or each row is the gradient of one output component)

Teaching Strategy #

Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.

You already know the gradient tells you how a scalar function changes as you nudge inputs. The Jacobian is the next step: it tells you how an entire vector of outputs changes—capturing the local linear behavior of a multivariable transformation and the volume-scaling you need for change of variables in integrals.

TL;DR:

The Jacobian J_f(x) = Df(x) is the matrix of first-order partial derivatives of a vector-valued map f. It is the best linear approximation to f near a point. When f maps ℝⁿ → ℝⁿ, det(J_f) measures local oriented volume scaling; |det(J_f)| is the factor used in change-of-variables formulas in integrals.

What Is Jacobian? #

Why you need a new object beyond the gradient #

For a scalar function g: ℝⁿ → ℝ, the gradient ∇g(x) summarizes first-order change: it’s the vector that best predicts how g changes for a small input step h.

But many important maps are vector-valued:

•A coordinate transform f(u, v) = (x, y)
•A physics map from state to state
•A neural network layer that maps an input vector to an output vector

When the output is a vector, the “rate of change” can’t be captured by a single vector. Each output component depends on each input component. The Jacobian packages all those partial derivatives into one matrix.

Definition (matrix of partial derivatives) #

Let f: ℝⁿ → ℝᵐ be written in components:

f(x) = (f₁(x), f₂(x), …, f_m(x))

where x = (x₁, x₂, …, x_n).

The Jacobian matrix of f at x is

J_f(x) = Df(x) = [ ∂f_i / ∂x_j ]

It is an m×n matrix whose (i, j) entry is:

(J_f(x))_{ij} = ∂f_i(x) / ∂x_j.

So:

•Rows correspond to output components f_i
•Columns correspond to input variables x_j

Relationship to the gradient (a comforting special case) #

If m = 1 (scalar output), then J_f is 1×n:

J_f(x) = [ ∂f/∂x₁ ∂f/∂x₂ … ∂f/∂x_n ]

This is exactly the gradient as a row vector (convention). Meanwhile ∇f(x) is usually a column vector. Transpose connects them:

J_f(x) = (∇f(x))ᵀ.

So the Jacobian generalizes the gradient.

A geometric preview: local linear map #

The most important intuition is not “array of partials”, but “best linear approximation.” Near a point a, f behaves like:

f(a + h) ≈ f(a) + J_f(a) h

for small h.

That expression is the multivariable analogue of the 1D approximation:

f(a + h) ≈ f(a) + f′(a) h.

Here, f′(a) is a number; in many dimensions, the derivative becomes a matrix.

Square case and determinant #

If m = n, then J_f(x) is n×n and you can take its determinant:

det(J_f(x))

This scalar has a deep meaning:

•sign(det) tells whether the map locally preserves or flips orientation
•|det| tells how much the map locally scales n-dimensional volume

That volume-scaling is exactly what appears in change-of-variables for integrals.

Core Mechanic 1: Jacobian as the Derivative (Linearization) #

Why linearization matters #

Most nonlinear functions are hard to analyze globally. But if you zoom in enough, smooth functions look linear. This is the core strategy behind:

•Newton’s method and other root-finding algorithms
•error propagation and sensitivity analysis
•optimization and gradient-based learning (locally linear steps)

The Jacobian is the device that turns “zooming in” into a concrete computation.

The best linear approximation statement #

Let f: ℝⁿ → ℝᵐ be differentiable at a. Then there exists a linear map L(h) such that

f(a + h) = f(a) + L(h) + r(h)

where the remainder satisfies

‖r(h)‖ / ‖h‖ → 0 as ‖h‖ → 0.

That linear map L is the derivative Df(a). When we choose coordinates, L is represented by the Jacobian matrix J_f(a), and we write:

f(a + h) ≈ f(a) + J_f(a) h.

Interpreting columns and rows #

Write eⱼ for the j-th standard basis vector in ℝⁿ (a 1 in position j, else 0). Then:

J_f(a) eⱼ = column j of J_f(a).

But eⱼ corresponds to “nudge only x_j”. So:

column j ≈ how the output vector changes when you increase x_j a tiny bit.

Equivalently, each row i is:

row i = [ ∂f_i/∂x₁ … ∂f_i/∂x_n ]

which is the gradient of the i-th output component (as a row). So:

•columns = sensitivity directions in input space
•rows = gradients of each output component

Chain rule in Jacobian form (the practical payoff) #

If f: ℝⁿ → ℝᵐ and g: ℝᵐ → ℝᵏ, then the composition g ∘ f: ℝⁿ → ℝᵏ has Jacobian:

J_{g∘f}(x) = J_g(f(x)) · J_f(x).

This is the multivariable chain rule, and it looks exactly like matrix multiplication.

It’s worth pausing to connect this to “linear approximation of a composition”:

•f turns a small input step h into an approximate output step J_f h
•g then turns that step into J_g (J_f h)
•overall: (J_g J_f) h

That is why the chain rule becomes matrix multiplication.

Directional derivatives via the Jacobian #

For a direction u ∈ ℝⁿ, the first-order change in f at a in direction u is:

Df(a) u = J_f(a) u.

This is the vector-valued directional derivative.

In the scalar-output case (m = 1), this reduces to:

J_f(a) u = (∇f(a))ᵀ u = ∇f(a) · u

which is the familiar directional derivative formula.

Small-error propagation (a common use) #

Suppose your input x has a small perturbation δx (measurement noise). Then the induced output perturbation is approximately:

δy ≈ J_f(x) δx.

So the Jacobian acts like a “local gain matrix.” This is the mathematical foundation for sensitivity analysis and for linearizing nonlinear systems in control and estimation.

Core Mechanic 2: Jacobian Determinant and Change of Variables #

Why determinants show up in integrals #

Integration measures “total accumulation.” In multiple dimensions, it’s accumulation over area (2D) or volume (3D and beyond). If you change coordinates, a small patch in the new coordinates may correspond to a differently sized patch in the old coordinates.

So you need a conversion factor between tiny volume elements:

(d volume in x-space) = (scale factor) · (d volume in u-space)

That scale factor is the absolute value of the Jacobian determinant.

Local volume scaling intuition #

Assume f: ℝⁿ → ℝⁿ is differentiable and J_f(a) is invertible.

Near a, f behaves like:

f(a + h) ≈ f(a) + J_f(a) h.

So near a, f is approximately the linear map h ↦ J_f(a) h. For a linear map A, the determinant det(A) gives the oriented volume scaling:

•A maps a tiny n-dimensional parallelepiped to another
•the volume scales by |det(A)|
•orientation flips if det(A) < 0

Therefore, for the nonlinear map f, the local volume scaling near a is approximately |det(J_f(a))|.

The change-of-variables formula (multivariable substitution) #

Let f: U ⊂ ℝⁿ → V ⊂ ℝⁿ be a bijective differentiable map with differentiable inverse (a diffeomorphism), and let g be integrable on V. Then:

∫∫…∫_V g(x) dx = ∫∫…∫_U g(f(u)) · |det(J_f(u))| du.

Here:

•u are the new coordinates
•x = f(u) are the old coordinates
•dx and du represent n-dimensional volume elements

The key idea is:

dx = |det(J_f(u))| du.

2D special case: area scaling #

In 2D, if f(u, v) = (x(u, v), y(u, v)), then

J_f(u, v) = [ ∂x/∂u ∂x/∂v

∂y/∂u ∂y/∂v ]

and the area element transforms as:

dx dy = |det(J_f(u, v))| du dv.

Common coordinate transforms #

Polar coordinates #

x = r cos θ

y = r sin θ

Compute J_f(r, θ):

∂x/∂r = cos θ ∂x/∂θ = −r sin θ

∂y/∂r = sin θ ∂y/∂θ = r cos θ

J_f = [ cos θ −r sin θ

sin θ r cos θ ]

and

det(J_f) = (cos θ)(r cos θ) − (−r sin θ)(sin θ)

= r cos²θ + r sin²θ

= r.

Thus:

dx dy = r dr dθ.

That single factor r is exactly the Jacobian determinant’s magnitude.

General lesson #

When you see an “extra factor” like r in polar coordinates (or r² sin φ in spherical coordinates), it is not arbitrary—it is the local volume-scaling |det(J_f)|.

Orientation vs absolute value #

The determinant can be negative. Integrals measure (unsigned) volume/area, so the change-of-variables formula uses:

|det(J_f)|.

If you’re doing differential geometry or oriented integrals, the sign can matter; for standard multivariable calculus integrals over regions, absolute value is the rule.

Application/Connection: Jacobians in Optimization, ML, and Matrix Calculus #

Why Jacobians show up constantly in ML #

Many ML models are compositions of vector-valued functions:

x → f₁(x) → f₂(f₁(x)) → … → y

Training relies on derivatives of a loss with respect to parameters, and those derivatives are built from Jacobians (and their transposes) via the chain rule.

Even when you mostly hear “gradients,” under the hood:

•a gradient is a Jacobian of a scalar-output function
•backprop is repeated application of the Jacobian chain rule

Jacobian vs gradient vs Hessian (positioning) #

You already know gradients. The Jacobian sits between gradient and Hessian in complexity:

Object	Typical function type	Shape	Captures	Notes
∇g(x)	g: ℝⁿ → ℝ	n×1	first-order change of scalar	steepest ascent direction
J_f(x)	f: ℝⁿ → ℝᵐ	m×n	first-order change of vector	linearization matrix
H_g(x)	g: ℝⁿ → ℝ	n×n	second-order change	curvature

A helpful mental model:

•gradient: “slope vector”
•Jacobian: “slope matrix”
•Hessian: “curvature matrix”

Jacobian-transpose trick (common in least squares) #

Suppose you have residuals r(x) ∈ ℝᵐ and a scalar loss:

L(x) = ½ ‖r(x)‖².

Then the gradient of L can be written using the Jacobian of r:

Let J = J_r(x) (an m×n matrix). Then:

L(x) = ½ ∑_{i=1}^m r_i(x)²

Differentiate component-wise. For j-th component:

∂L/∂x_j = ½ ∑_{i=1}^m 2 r_i(x) · ∂r_i/∂x_j

= ∑_{i=1}^m r_i(x) · J_{ij}

In matrix form:

∇L(x) = J_r(x)ᵀ r(x).

This identity appears in Gauss–Newton, Levenberg–Marquardt, and many optimization routines.

Jacobian in dynamics and stability #

For a dynamical system:

x_{t+1} = f(x_t)

the Jacobian J_f at a fixed point x⋆ characterizes local stability:

x_{t+1} − x⋆ ≈ J_f(x⋆) ( x_t − x⋆ )

Eigenvalues of J_f(x⋆) determine whether perturbations shrink or grow.

Bridge to Matrix Calculus #

Matrix calculus generalizes these ideas when variables and outputs are vectors/matrices and you want systematic rules.

Key bridge concepts you’ll use next:

•organizing derivatives consistently (shapes and conventions)
•Jacobians of common vector operations
•combining Jacobians with chain rule for complex compositions

This node sets the foundation: once “derivative = linear map = Jacobian matrix” feels natural, matrix calculus becomes mostly careful bookkeeping plus chain rule.

Worked Examples (3) #

Compute a Jacobian and use it to linearize a vector-valued function #

Let f: ℝ² → ℝ² be f(x, y) = (f₁(x, y), f₂(x, y)) where f₁(x, y) = x²y and f₂(x, y) = sin(x + y). Compute J_f(x, y). Then linearize at (1, 0) to approximate f(1.02, −0.01).

Step 1: Compute partial derivatives for f₁(x, y) = x²y.
∂f₁/∂x = 2xy
∂f₁/∂y = x²
Step 2: Compute partial derivatives for f₂(x, y) = sin(x + y).
∂f₂/∂x = cos(x + y)
∂f₂/∂y = cos(x + y)
Step 3: Assemble the Jacobian matrix.
J_f(x, y) = [ 2xy x²
cos(x+y) cos(x+y) ]
Step 4: Evaluate the Jacobian at (1, 0).
J_f(1, 0) = [ 2·1·0 1²
cos(1+0) cos(1+0) ]
= [ 0 1
cos 1 cos 1 ]
Step 5: Compute f(1, 0).
f(1, 0) = (1²·0, sin(1+0)) = (0, sin 1)
Step 6: Form the small displacement h from (1, 0) to (1.02, −0.01).
h = (Δx, Δy) = (0.02, −0.01)
Step 7: Apply the linearization f(a+h) ≈ f(a) + J_f(a) h.
J_f(1,0)h = [ 0 1
cos1 cos1 ] [ 0.02
−0.01 ]
First component: 0·0.02 + 1·(−0.01) = −0.01
Second component: cos1·0.02 + cos1·(−0.01) = cos1·(0.01) = 0.01 cos1
Step 8: Combine.
f(1.02, −0.01) ≈ (0, sin1) + (−0.01, 0.01 cos1)
= (−0.01, sin1 + 0.01 cos1)

Insight: The Jacobian turns “small input change” into “approximate output change” via matrix multiplication. Notice how the first output f₁ is most sensitive to y near (1,0) (since ∂f₁/∂x = 0 there), which is immediately visible in J_f(1,0).

Derive the polar-coordinate area factor using det(J) #

Use the transformation f(r, θ) = (x, y) = (r cos θ, r sin θ). Compute det(J_f) and show that dx dy = r dr dθ.

Step 1: Write the Jacobian matrix.
J_f(r, θ) = [ ∂x/∂r ∂x/∂θ
∂y/∂r ∂y/∂θ ]
Step 2: Compute the partial derivatives.
∂x/∂r = cos θ
∂x/∂θ = −r sin θ
∂y/∂r = sin θ
∂y/∂θ = r cos θ
Step 3: Substitute into the matrix.
J_f(r, θ) = [ cos θ −r sin θ
sin θ r cos θ ]
Step 4: Compute the determinant.
det(J_f) = (cos θ)(r cos θ) − (−r sin θ)(sin θ)
= r cos²θ + r sin²θ
= r( cos²θ + sin²θ )
= r
Step 5: Convert the area element.
dx dy = |det(J_f(r, θ))| dr dθ = |r| dr dθ
In standard polar coordinates, r ≥ 0, so |r| = r.
Therefore dx dy = r dr dθ.

Insight: The mysterious “extra r” in polar integrals is exactly local area scaling. A tiny rectangle of size dr×dθ in (r,θ)-space maps to a curved wedge-like region in (x,y)-space whose area is approximately r·dr·dθ.

Use the Jacobian chain rule to differentiate a composition #

Let f: ℝ² → ℝ² be f(x, y) = (u, v) = (x + y, x − y). Let g: ℝ² → ℝ² be g(u, v) = (u², uv). Compute J_{g∘f}(x, y) using the chain rule.

Step 1: Compute J_f(x, y).
u = x + y ⇒ ∂u/∂x = 1, ∂u/∂y = 1
v = x − y ⇒ ∂v/∂x = 1, ∂v/∂y = −1
So J_f(x, y) = [ 1 1
1 −1 ]
Step 2: Compute J_g(u, v).
First component: g₁(u, v) = u²
∂g₁/∂u = 2u, ∂g₁/∂v = 0
Second component: g₂(u, v) = uv
∂g₂/∂u = v, ∂g₂/∂v = u
So J_g(u, v) = [ 2u 0
v u ]
Step 3: Apply the chain rule.
J_{g∘f}(x, y) = J_g(f(x, y)) · J_f(x, y)
Substitute u = x + y and v = x − y:
J_g(f(x, y)) = [ 2(x+y) 0
(x−y) (x+y) ]
Step 4: Multiply the matrices.
J_{g∘f} = [ 2(x+y) 0
(x−y) (x+y) ] [ 1 1
1 −1 ]
Compute entry-by-entry:
Top row:
(1,1): 2(x+y)·1 + 0·1 = 2(x+y)
(1,2): 2(x+y)·1 + 0·(−1) = 2(x+y)
Bottom row:
(2,1): (x−y)·1 + (x+y)·1 = (x−y)+(x+y)=2x
(2,2): (x−y)·1 + (x+y)·(−1) = (x−y)−(x+y)=−2y
So J_{g∘f}(x, y) = [ 2(x+y) 2(x+y)
2x −2y ]

Insight: The Jacobian chain rule is “just” matrix multiplication because derivatives are linear maps. Computing J_g at (u,v) and then substituting (u,v)=f(x,y) keeps the structure clean and scales to long compositions.

Key Takeaways #

✓
The Jacobian J_f(x) = Df(x) is the m×n matrix with entries (∂f_i/∂x_j), describing first-order change of f: ℝⁿ → ℝᵐ.
✓
Linearization: for small h, f(a+h) ≈ f(a) + J_f(a) h; the Jacobian is the best linear approximation near a.
✓
Columns of J_f describe how the output changes when you perturb one input coordinate; rows are gradients of each output component (as row vectors).
✓
Chain rule: J_{g∘f}(x) = J_g(f(x)) · J_f(x)—composition becomes matrix multiplication.
✓
When n = m, det(J_f(x)) measures local oriented volume scaling; |det(J_f(x))| is the local (unsigned) volume scale factor.
✓
Change of variables in integrals uses dx = |det(J_f(u))| du for x = f(u).
✓
Many “extra factors” in coordinate systems (like r in polar) are exactly Jacobian determinants.

Common Mistakes #

✗
Mixing up the shape: for f: ℝⁿ → ℝᵐ, the Jacobian is m×n (outputs by inputs), not n×m.
✗
Confusing ∇f with J_f: the gradient is for scalar outputs; for vector outputs you need the full Jacobian (or one gradient per component).
✗
Forgetting the absolute value in change-of-variables: integrals over regions use |det(J)|, not det(J) when det could be negative.
✗
Evaluating J_g at the wrong point in the chain rule: J_{g∘f}(x) requires J_g at f(x), not at x.

Practice #

easy

Let f(x, y, z) = (xy, yz). Compute J_f(x, y, z). What is J_f(1, 2, 3)?

Hint: There are m=2 outputs and n=3 inputs, so J is 2×3. Differentiate each output with respect to x, y, z.

Show solution

f₁=xy ⇒ ∂f₁/∂x=y, ∂f₁/∂y=x, ∂f₁/∂z=0.

f₂=yz ⇒ ∂f₂/∂x=0, ∂f₂/∂y=z, ∂f₂/∂z=y.

So J_f(x,y,z) = [ y x 0

0 z y ].

At (1,2,3): J_f(1,2,3) = [ 2 1 0

0 3 2 ].

medium

Let f: ℝ² → ℝ² be f(u, v) = (x, y) = (u² − v², 2uv). (This maps to complex squaring.) Compute det(J_f(u, v)).

Hint: Compute ∂x/∂u, ∂x/∂v, ∂y/∂u, ∂y/∂v, then take a 2×2 determinant.

Show solution

x=u²−v² ⇒ ∂x/∂u=2u, ∂x/∂v=−2v.

y=2uv ⇒ ∂y/∂u=2v, ∂y/∂v=2u.

J_f(u,v) = [ 2u −2v

2v 2u ].

det(J_f) = (2u)(2u) − (−2v)(2v)

= 4u² + 4v²

= 4(u²+v²).

hard

Use a Jacobian to perform the substitution in the integral ∬_R (x + y) dx dy where R is the parallelogram defined by x = u + v, y = u − v with (u, v) ∈ [0,1]×[0,1]. Compute the value.

Hint: Compute det(J_f) for f(u,v)=(x,y). Rewrite x+y in terms of u,v. Then integrate over the unit square and multiply by |det(J_f)|.

Show solution

Define f(u,v)=(x,y) with x=u+v, y=u−v.

Jacobian:

J_f = [ ∂x/∂u ∂x/∂v

∂y/∂u ∂y/∂v ]

= [ 1 1

1 −1 ].

det(J_f) = (1)(−1) − (1)(1) = −2, so |det(J_f)|=2.

Rewrite integrand:

x+y = (u+v)+(u−v)=2u.

Change variables:

∬_R (x+y) dx dy = ∬_{[0,1]²} (2u) · 2 du dv = ∬_{[0,1]²} 4u du dv.

Compute:

∫_0^1 ∫_0^1 4u dv du = ∫_0^1 (4u·1) du = 4 ∫_0^1 u du = 4·(1/2)=2.

Connections #

Next: Matrix Calculus

Related reinforcement nodes you may have seen:

•Gradients
•Matrix Operations

Forward links this enables:

•Jacobian chain rule → backprop-style differentiation in vector form
•Jacobian determinant → multivariable substitution and probability density transforms

Quality: A (4.5/5)

← back to tree browse all →