Lagrangian Duality

←Back to Tech Tree

inventorycoverage

Lagrangian Duality #

OptimizationDifficulty: ★★★★★Depth: 10Unlocks: 0

Dual problems, weak and strong duality. Dual ascent.

Interactive Visualization #

⏮◀◀▶▶STEP0.25x1xZOOM

t=0s

Core Concepts #

Key Symbols & Notation #

L(x, lambda, nu)d(lambda, nu) = inf_x L(x, lambda, nu)

Essential Relationships #

Prerequisites (2) #

KKT Conditions6 atomsLagrange Multipliers5 atoms

Referenced by (3) #

Where this concept shows up in the operating-finance and personal-finance graphs.

From Business (3) #

[AllocationBusiness

Dual variables give shadow prices - the marginal value of each scarce resource. Duality bridges the optimization math to the economic interpretation of allocation constraints](/business/allocation/)[Shadow PriceBusiness

Shadow prices are dual variables - understanding Lagrangian duality explains why every primal constraint has a corresponding price and why strong duality makes shadow prices actionable](/business/shadow-price/)[resource allocationBusiness

Primal-dual and dual-ascent methods ARE Lagrangian duality applied - the concept description names these methods directly](/business/resource-allocation/)

Advanced Learning Details

Graph Position #

87

Depth Cost

0

Fan-Out (ROI)

0

Bottleneck Score

10

Chain Length

Cognitive Load #

6

Atomic Elements

36

Total Elements

L2

Percentile Level

L4

Atomic Level

All Concepts (13) #

Teaching Strategy #

Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.

Constrained optimization feels like juggling: you want to minimize an objective while never dropping the constraints. Lagrangian duality turns that juggling act into a different game—choose prices (multipliers) for constraint violation so that the best “priced” solution automatically respects the constraints. The surprising part: those prices also generate guaranteed lower bounds on the true optimum, and in many important cases the bound becomes exact.

TL;DR:

Given a primal minimization problem with constraints, form the Lagrangian L(x,λ,ν)=f(x)+∑iλigi(x)+∑jνjhj(x)L(x,\lambda,\nu)=f(x)+\sum_i \lambda_i g_i(x)+\sum_j \nu_j h_j(x)L(x,λ,ν)=f(x)+∑i​λi​gi​(x)+∑j​νj​hj​(x) with λ≥0\lambda\ge 0λ≥0. The dual function d(λ,ν)=inf⁡xL(x,λ,ν)d(\lambda,\nu)=\inf_x L(x,\lambda,\nu)d(λ,ν)=infx​L(x,λ,ν) is always a lower bound on the primal optimum (weak duality). The dual problem maximizes this bound over multipliers. Under conditions like Slater’s condition for convex problems, the bound is tight (strong duality). Dual ascent updates multipliers to improve the bound and can solve large problems by exploiting separability and the structure of inf⁡xL\inf_x Linfx​L.

What Is Lagrangian Duality? #

The primal problem (what we start with) #

A standard constrained minimization problem (the primal) is:

minimizexf(x)subject togi(x)≤0,  i=1,…,mhj(x)=0,  j=1,…,p\begin{aligned}
\text{minimize}_{x} \quad & f(x) \
\text{subject to} \quad & g_i(x) \le 0, ; i=1,\dots,m \
& h_j(x) = 0, ; j=1,\dots,p
\end{aligned}minimizex​subject to​f(x)gi​(x)≤0,i=1,…,mhj​(x)=0,j=1,…,p​

Here:

You already know KKT conditions and Lagrange multipliers. Duality builds on those ideas but changes the emphasis: instead of directly enforcing constraints, we price violations.

The Lagrangian (the key scalar object) #

Define the Lagrangian:

L(x,λ,ν)=f(x)+∑i=1mλigi(x)+∑j=1pνjhj(x)L(x,\lambda,\nu) = f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^p \nu_j h_j(x)L(x,λ,ν)=f(x)+i=1∑m​λi​gi​(x)+j=1∑p​νj​hj​(x)

Intuition:

This mirrors a familiar idea: “soft constraints” with penalties. But duality is more precise: the penalties are variables you optimize.

The dual function: best value of the priced problem #

Given multipliers (λ,ν)(\lambda,\nu)(λ,ν), consider minimizing the Lagrangian over xxx:

d(λ,ν)=inf⁡xL(x,λ,ν)d(\lambda,\nu) = \inf_x L(x,\lambda,\nu)d(λ,ν)=xinf​L(x,λ,ν)

This ddd is called the dual function.

Why is it an infimum (not necessarily a minimum)? Because for some multipliers the Lagrangian might not attain a minimizer, or might go to −∞-\infty−∞.

Two important facts (we will prove them carefully soon):

  1. 1)For any λ≥0\lambda \ge 0λ≥0, d(λ,ν)d(\lambda,\nu)d(λ,ν) is a lower bound on the primal optimum value p∗p^*p∗.
  2. 2)d(λ,ν)d(\lambda,\nu)d(λ,ν) is concave in (λ,ν)(\lambda,\nu)(λ,ν), even when the primal problem is not convex.

The dual problem: maximize the best lower bound #

If each (λ,ν)(\lambda,\nu)(λ,ν) gives a lower bound, we should pick the best one:

maximizeλ,νd(λ,ν)subject toλ≥0\begin{aligned}
\text{maximize}_{\lambda,\nu} \quad & d(\lambda,\nu) \
\text{subject to} \quad & \lambda \ge 0
\end{aligned}maximizeλ,ν​subject to​d(λ,ν)λ≥0​

This is the Lagrange dual problem. Its optimal value is denoted d∗d^*d∗.

Why duality is worth learning #

Duality is not just “another way to write the same thing.” It gives:

We’ll proceed slowly: first we’ll make the “lower bound” statement completely concrete; then we’ll talk about when the bound becomes tight; then we’ll turn that into an algorithm.

Core Mechanic 1 — The Dual Function as an Infimum (and Why It Lower-Bounds the Primal) #

Step 1: Fix multipliers and look at L(x,λ,ν)L(x,\lambda,\nu)L(x,λ,ν) #

For a fixed (λ,ν)(\lambda,\nu)(λ,ν) with λ≥0\lambda\ge 0λ≥0, the Lagrangian is just a scalar function of xxx:

L(x,λ,ν)=f(x)+∑iλigi(x)+∑jνjhj(x).L(x,\lambda,\nu)= f(x) + \sum_i \lambda_i g_i(x) + \sum_j \nu_j h_j(x).L(x,λ,ν)=f(x)+i∑​λi​gi​(x)+j∑​νj​hj​(x).

Now imagine two categories of xxx:

For feasible xxx, we can compare L(x,λ,ν)L(x,\lambda,\nu)L(x,λ,ν) and f(x)f(x)f(x).

Step 2: For feasible xxx, the Lagrangian never exceeds the objective #

If xxx is feasible, then:

Therefore:

L(x,λ,ν)=f(x)+∑iλigi(x)+∑jνjhj(x)≤f(x).L(x,\lambda,\nu) = f(x) + \sum_i \lambda_i g_i(x) + \sum_j \nu_j h_j(x)
\le f(x).L(x,λ,ν)=f(x)+i∑​λi​gi​(x)+j∑​νj​hj​(x)≤f(x).

This inequality is the heart of weak duality.

Step 3: Take the best (lowest) Lagrangian value over all xxx #

By definition,

d(λ,ν)=inf⁡xL(x,λ,ν).d(\lambda,\nu) = \inf_x L(x,\lambda,\nu).d(λ,ν)=xinf​L(x,λ,ν).

The infimum over all xxx is certainly no larger than the value at any particular feasible xxx:

d(λ,ν)≤L(x,λ,ν)≤f(x)for any feasible x.d(\lambda,\nu) \le L(x,\lambda,\nu) \le f(x) \quad \text{for any feasible } x.d(λ,ν)≤L(x,λ,ν)≤f(x)for any feasible x.

Now take the infimum over all feasible xxx on the right-hand side. The best feasible objective value is the primal optimum value p∗p^*p∗:

d(λ,ν)≤p∗for all λ≥0.d(\lambda,\nu) \le p^* \quad \text{for all } \lambda\ge 0.d(λ,ν)≤p∗for all λ≥0.

This is weak duality.

Weak duality: For a primal minimization problem, any dual-feasible (λ,ν)(\lambda,\nu)(λ,ν) gives a lower bound on the primal optimal value.

Pause point #1 (tiny sanity-check) #

Before moving on, check that you can literally compute a dual function once.

Consider the 1D problem:

minimizex∈Rx2subject tox≥1\begin{aligned}
\text{minimize}_{x\in\mathbb{R}} \quad & x^2 \
\text{subject to} \quad & x \ge 1
\end{aligned}minimizex∈R​subject to​x2x≥1​

Rewrite the constraint as g(x)=1−x≤0g(x)=1-x\le 0g(x)=1−x≤0.

Try computing d(λ)d(\lambda)d(λ) by minimizing over xxx (complete the square or set derivative to zero). Then compare max⁡λ≥0d(λ)\max_{\lambda\ge 0} d(\lambda)maxλ≥0​d(λ) with the primal optimum p∗=1p^*=1p∗=1.

(We’ll do a closely related worked example later, but you should attempt this now—this is the first “feel” of inf⁡x\inf_xinfx​.)

The duality gap #

Define:

Weak duality implies:

d∗≤p∗.d^* \le p^*.d∗≤p∗.

The difference p∗−d∗p^* - d^*p∗−d∗ is the duality gap.

Why the dual function is concave (important for optimization) #

This may feel counterintuitive: d(λ,ν)=inf⁡xL(x,λ,ν)d(\lambda,\nu)=\inf_x L(x,\lambda,\nu)d(λ,ν)=infx​L(x,λ,ν), an infimum of functions linear in (λ,ν)(\lambda,\nu)(λ,ν). But the infimum of affine functions is concave.

Proof sketch (worth internalizing): for each fixed xxx, define

ϕx(λ,ν)=L(x,λ,ν).\phi_x(\lambda,\nu)=L(x,\lambda,\nu).ϕx​(λ,ν)=L(x,λ,ν).

As a function of (λ,ν)(\lambda,\nu)(λ,ν), ϕx\phi_xϕx​ is affine (linear + constant) because it is f(x)+λTg(x)+νTh(x)f(x) + \lambda^T g(x) + \nu^T h(x)f(x)+λTg(x)+νTh(x).

Then

d(λ,ν)=inf⁡xϕx(λ,ν).d(\lambda,\nu) = \inf_x \phi_x(\lambda,\nu).d(λ,ν)=xinf​ϕx​(λ,ν).

The pointwise infimum of affine functions is concave. Concretely, for $0\le t\le 1$:

d(t(λ1,ν1)+(1−t)(λ2,ν2))=inf⁡xϕx(t(λ1,ν1)+(1−t)(λ2,ν2))=inf⁡x(tϕx(λ1,ν1)+(1−t)ϕx(λ2,ν2))≥tinf⁡xϕx(λ1,ν1)+(1−t)inf⁡xϕx(λ2,ν2)=td(λ1,ν1)+(1−t)d(λ2,ν2).\begin{aligned}
d(t(\lambda_1,\nu_1)+(1-t)(\lambda_2,\nu_2))
&= \inf_x \phi_x(t(\lambda_1,\nu_1)+(1-t)(\lambda_2,\nu_2)) \
&= \inf_x \left(t\phi_x(\lambda_1,\nu_1)+(1-t)\phi_x(\lambda_2,\nu_2)\right) \
&\ge t\inf_x \phi_x(\lambda_1,\nu_1) + (1-t)\inf_x \phi_x(\lambda_2,\nu_2) \
&= t d(\lambda_1,\nu_1) + (1-t) d(\lambda_2,\nu_2).
\end{aligned}d(t(λ1​,ν1​)+(1−t)(λ2​,ν2​))​=xinf​ϕx​(t(λ1​,ν1​)+(1−t)(λ2​,ν2​))=xinf​(tϕx​(λ1​,ν1​)+(1−t)ϕx​(λ2​,ν2​))≥txinf​ϕx​(λ1​,ν1​)+(1−t)xinf​ϕx​(λ2​,ν2​)=td(λ1​,ν1​)+(1−t)d(λ2​,ν2​).​

That inequality step is the key: infimum of a sum is ≥ sum of infima.

Interactive-canvas moment: visualize “lower bound curves” #

If your tech tree platform supports an interactive plot, this is the perfect place to attach it.

Suggested interaction:

What you should see:

This visualization makes the definition d(λ)=inf⁡xL(x,λ)d(\lambda)=\inf_x L(x,\lambda)d(λ)=infx​L(x,λ) feel less abstract: you are literally pushing up the best guaranteed lower bound by tuning prices.

Core Mechanic 2 — Strong Duality: When the Lower Bound Becomes Exact #

Weak duality always holds, but strong duality is where duality becomes a solving tool, not just a bounding tool.

Convexity is the enabling assumption #

Strong duality is most commonly guaranteed in convex optimization problems of the form:

minimizexf(x)subject togi(x)≤0,  i=1,…,mAx=b\begin{aligned}
\text{minimize}_x \quad & f(x) \
\text{subject to} \quad & g_i(x) \le 0, ; i=1,\dots,m \
& Ax=b
\end{aligned}minimizex​subject to​f(x)gi​(x)≤0,i=1,…,mAx=b​

where:

(Equality constraints more generally can be affine functions hj(x)=0h_j(x)=0hj​(x)=0; writing them as Ax=bAx=bAx=b is a clean special case.)

Convexity matters because it prevents “hidden” nonconvex geometry that can create a duality gap.

Slater’s condition (the standard constraint qualification) #

A widely used sufficient condition for strong duality is Slater’s condition:

For convex problems with inequality constraints gi(x)≤0g_i(x)\le 0gi​(x)≤0 and affine equalities, if there exists a point x~\tilde{x}x~ such that

then strong duality holds and KKT conditions are necessary and sufficient.

So Slater is an “interior point exists” condition.

Why do we need it? Intuitively:

What strong duality gives you (operationally) #

If strong duality holds:

p∗=d∗.p^* = d^*.p∗=d∗.

Moreover, there exist optimal multipliers (λ∗,ν∗)(\lambda^*,\nu^*)(λ∗,ν∗) such that:

In other words, duality explains why KKT works: it’s the optimality condition for a saddle point of the Lagrangian.

Saddle-point picture (a unifying mental model) #

Think of the Lagrangian as a two-player game:

The primal problem is like:

p∗=inf⁡x feasiblef(x).p^* = \inf_{x \text{ feasible}} f(x).p∗=x feasibleinf​f(x).

The dual problem is:

d∗=sup⁡λ≥0,νinf⁡xL(x,λ,ν).d^* = \sup_{\lambda\ge 0,\nu} \inf_x L(x,\lambda,\nu).d∗=λ≥0,νsup​xinf​L(x,λ,ν).

A saddle point (x∗,λ∗,ν∗)(x^*,\lambda^*,\nu^*)(x∗,λ∗,ν∗) satisfies:

L(x∗,λ,ν)≤L(x∗,λ∗,ν∗)≤L(x,λ∗,ν∗)L(x^*,\lambda,\nu) \le L(x^*,\lambda^*,\nu^*) \le L(x,\lambda^*,\nu^*)L(x∗,λ,ν)≤L(x∗,λ∗,ν∗)≤L(x,λ∗,ν∗)

for all feasible multipliers and all xxx.

Strong duality is closely related to the existence of such a saddle point.

Pause point #2 (Slater sanity-check) #

Decide whether Slater’s condition holds.

Problem A:

minimizex∈Rxsubject tox2≤1\begin{aligned}
\text{minimize}_{x\in\mathbb{R}} \quad & x \
\text{subject to} \quad & x^2 \le 1
\end{aligned}minimizex∈R​subject to​xx2≤1​

Is there a strictly feasible point? Yes: x=0x=0x=0 gives $0<1$. So Slater holds.

Problem B:

minimizex∈Rxsubject tox2≤0\begin{aligned}
\text{minimize}_{x\in\mathbb{R}} \quad & x \
\text{subject to} \quad & x^2 \le 0
\end{aligned}minimizex∈R​subject to​xx2≤0​

Feasible set is only {0}{0}{0}. There is no xxx with x2<0x^2<0x2<0, so Slater fails.

Take 30 seconds to reflect: both problems are convex, but B has a “boundary-only” feasible set.

Slater failing does not automatically mean strong duality fails, but it removes a key guarantee.

Strong duality is not universal #

For nonconvex problems, duality can be loose:

This is why in combinatorial optimization, one often uses duality as a relaxation: solve the dual (or a related convex relaxation) to get bounds even if you can’t solve the primal exactly.

A practical consequence: dual optimality = optimal certificate #

When strong duality holds, a dual optimal solution (λ∗,ν∗)(\lambda^*,\nu^*)(λ∗,ν∗) is a certificate of optimality:

This “sandwich” proof is one of the most useful ways to trust an optimizer.

Application/Connection — Dual Ascent and How Duality Becomes an Algorithm #

Duality is not only theory; it directly yields optimization methods.

The dual problem as concave maximization #

The dual problem is:

max⁡λ≥0,ν  d(λ,ν)\max_{\lambda\ge 0,\nu} ; d(\lambda,\nu)λ≥0,νmax​d(λ,ν)

with ddd concave. So we are maximizing a concave function, which is the “nice” direction (equivalently, minimizing a convex function −d-d−d).

But there’s a catch: d(λ,ν)d(\lambda,\nu)d(λ,ν) is defined via an infimum over xxx. It may be nonsmooth.

Computing a subgradient of the dual function #

Suppose for given (λ,ν)(\lambda,\nu)(λ,ν) we can find

x(λ,ν)∈arg⁡min⁡xL(x,λ,ν).x(\lambda,\nu) \in \arg\min_x L(x,\lambda,\nu).x(λ,ν)∈argxmin​L(x,λ,ν).

Then a subgradient of ddd is given by constraint residuals at that minimizing xxx:

Intuition:

Dual ascent updates #

A basic dual ascent / subgradient method:

  1. 1)Given (λk,νk)(\lambda^k,\nu^k)(λk,νk), compute

xk∈arg⁡min⁡xL(x,λk,νk).x^k \in \arg\min_x L(x,\lambda^k,\nu^k).xk∈argxmin​L(x,λk,νk).

  1. 2)Update multipliers using residuals:

λk+1=Πλ≥0(λk+αk g(xk))\lambda^{k+1} = \Pi_{\lambda\ge 0}\big(\lambda^k + \alpha_k, g(x^k)\big)λk+1=Πλ≥0​(λk+αk​g(xk))

νk+1=νk+αk h(xk)\nu^{k+1} = \nu^k + \alpha_k, h(x^k)νk+1=νk+αk​h(xk)

Where:

Why projection only for λ\lambdaλ? Because dual feasibility requires λ≥0\lambda\ge 0λ≥0 for inequalities, but ν\nuν is free.

How to choose step sizes (briefly) #

Subgradient methods are sensitive to step sizes.

Common choices:

In practice, many systems use more advanced variants (accelerations, adaptive rules) or different splitting methods (e.g., ADMM) that are also rooted in the Lagrangian.

Why dual ascent can be efficient: separability and decomposition #

A major payoff occurs when the Lagrangian separates over components of xxx.

Suppose x=(x1,…,xN)x=(x_1,\dots,x_N)x=(x1​,…,xN​) and

f(x)=∑n=1Nfn(xn)f(x)=\sum_{n=1}^N f_n(x_n)f(x)=n=1∑N​fn​(xn​)

and constraints couple them only through a simple sum, e.g.

∑n=1NAnxn=b.\sum_{n=1}^N A_n x_n = b.n=1∑N​An​xn​=b.

Then

L(x,ν)=∑n=1Nfn(xn)+νT(∑n=1NAnxn−b)=∑n=1N(fn(xn)+νTAnxn)−νTb.L(x,\nu)=\sum_{n=1}^N f_n(x_n) + \nu^T\left(\sum_{n=1}^N A_n x_n - b\right)
= \sum_{n=1}^N \left(f_n(x_n)+\nu^T A_n x_n\right) - \nu^T b.L(x,ν)=n=1∑N​fn​(xn​)+νT(n=1∑N​An​xn​−b)=n=1∑N​(fn​(xn​)+νTAn​xn​)−νTb.

Now the minimization over xxx splits:

inf⁡xL(x,ν)=∑n=1Ninf⁡xn(fn(xn)+νTAnxn)−νTb.\inf_x L(x,\nu) = \sum_{n=1}^N \inf_{x_n}\left(f_n(x_n)+\nu^T A_n x_n\right) - \nu^T b.xinf​L(x,ν)=n=1∑N​xn​inf​(fn​(xn​)+νTAn​xn​)−νTb.

So computing d(ν)d(\nu)d(ν) is NNN smaller problems, often parallelizable.

This is the conceptual foundation of distributed optimization: multipliers coordinate many local optimizations.

Connecting back to KKT #

If strong duality holds and the method converges to (λ∗,ν∗)(\lambda^*,\nu^*)(λ∗,ν∗) with corresponding x∗x^*x∗ minimizing L(⋅,λ∗,ν∗)L(\cdot,\lambda^*,\nu^*)L(⋅,λ∗,ν∗), then KKT conditions emerge:

This is a clean story:

A second interactive-canvas moment (algorithm intuition) #

If you can animate iterations:

What learners should observe:

This makes dual ascent feel like an automatic constraint-enforcement mechanism driven by prices.

Worked Examples (3) #

Dual function and dual optimum for a 1D quadratic with an inequality #

Primal:

minimizex∈Rx2subject tox≥1\begin{aligned}
\text{minimize}_{x\in\mathbb{R}} \quad & x^2 \
\text{subject to} \quad & x \ge 1
\end{aligned}minimizex∈R​subject to​x2x≥1​

Rewrite x≥1x\ge 1x≥1 as g(x)=1−x≤0g(x)=1-x\le 0g(x)=1−x≤0.

  1. Form the Lagrangian (single multiplier):

    \n$L(x,λ)=x2+λ(1−x),λ≥0.L(x,\lambda)=x^2+\lambda(1-x),\quad \lambda\ge 0.L(x,λ)=x2+λ(1−x),λ≥0.$

  2. Compute the dual function:

    \n$d(λ)=inf⁡x(x2−λx+λ).d(\lambda)=\inf_x \left(x^2-\lambda x + \lambda\right).d(λ)=infx​(x2−λx+λ).$

  3. Minimize the quadratic over x by setting derivative to zero:

    \n$ddx(x2−λx+λ)=2x−λ=0⇒x∗(λ)=λ2.\frac{d}{dx}\left(x^2-\lambda x + \lambda\right)=2x-\lambda=0 \Rightarrow x^*(\lambda)=\frac{\lambda}{2}.dxd​(x2−λx+λ)=2x−λ=0⇒x∗(λ)=2λ​.$

  4. Plug back in to get the infimum value:

    \n$$

    \begin{aligned}

    d(\lambda)

    &= \left(\frac{\lambda}{2}\right)^2 - \lambda\left(\frac{\lambda}{2}\right) + \lambda \

    &= \frac{\lambda^2}{4} - \frac{\lambda^2}{2} + \lambda \

    &= -\frac{\lambda^2}{4}+\lambda.

    \end{aligned}

  5. Solve the dual problem:

    \n$max⁡λ≥0  −λ24+λ.\max_{\lambda\ge 0}; -\frac{\lambda^2}{4}+\lambda.maxλ≥0​−4λ2​+λ.$

  6. This is a concave parabola. Differentiate:

    \n$ddλ(−λ24+λ)=−λ2+1.\frac{d}{d\lambda}\left(-\frac{\lambda^2}{4}+\lambda\right)=-\frac{\lambda}{2}+1.dλd​(−4λ2​+λ)=−2λ​+1.$

  7. Set to zero: −λ/2+1=0⇒λ∗=2-\lambda/2+1=0 \Rightarrow \lambda^*=2−λ/2+1=0⇒λ∗=2 (and it satisfies λ≥0\lambda\ge 0λ≥0).

  8. Compute the dual optimum value:

    \n$d∗=d(2)=−44+2=1.d^*=d(2)=-\frac{4}{4}+2=1.d∗=d(2)=−44​+2=1.$

  9. Compute primal optimum for comparison: constraint forces x≥1x\ge 1x≥1, and x2x^2x2 is minimized at x=1x=1x=1, so p∗=1p^*=1p∗=1.

Insight: Here d∗=p∗d^*=p^*d∗=p∗, so strong duality holds (the problem is convex and Slater holds since e.g. x=2x=2x=2 gives strict feasibility $1-2<0).Alsonoticethemeaningof). Also notice the meaning of ).Alsonoticethemeaningof\lambda^=2$: it is the price that makes the unconstrained minimizer of $L$ land exactly at the boundary $x=1$ (since $x^(\lambda)=\lambda/2$).

Deriving the dual of a quadratic program with an equality constraint #

Primal (a simple equality-constrained QP):

minimizex∈Rn12xTQx+cTxsubject toAx=b\begin{aligned}
\text{minimize}_{x\in\mathbb{R}^n} \quad & \frac{1}{2}x^T Q x + c^T x \
\text{subject to} \quad & Ax=b
\end{aligned}minimizex∈Rn​subject to​21​xTQx+cTxAx=b​

Assume Q≻0Q\succ 0Q≻0 (positive definite), so the objective is strictly convex and the minimizer is unique for each (\nu).

  1. Form the Lagrangian (only equality multipliers):

    \n$L(x,ν)=12xTQx+cTx+νT(Ax−b).L(x,\nu)=\frac{1}{2}x^TQx + c^T x + \nu^T(Ax-b).L(x,ν)=21​xTQx+cTx+νT(Ax−b).$

  2. Compute the dual function:

    \n$d(ν)=inf⁡xL(x,ν).d(\nu)=\inf_x L(x,\nu).d(ν)=infx​L(x,ν).$

  3. Minimize over x by stationarity (differentiate w.r.t. x and set to 0):

    \n$∇xL(x,ν)=Qx+c+ATν=0.\nabla_x L(x,\nu)=Qx+c+A^T\nu=0.∇x​L(x,ν)=Qx+c+ATν=0.$

  4. Solve for the minimizing x as a function of (\nu):

    \n$x∗(ν)=−Q−1(c+ATν).x^*(\nu)=-Q^{-1}(c+A^T\nu).x∗(ν)=−Q−1(c+ATν).$

  5. Plug back into the Lagrangian. First expand:

    \n$$

    \begin{aligned}

    L(x,\nu)

    &=\frac{1}{2}x^TQx + (c+A^T\nu)^T x - \nu^T b.

    \end{aligned}

  6. Use the standard quadratic minimization identity: for Q≻0Q\succ 0Q≻0,

    \n$inf⁡x(12xTQx+qTx)=−12qTQ−1q,\inf_x \left(\frac{1}{2}x^TQx + q^T x\right) = -\frac{1}{2} q^T Q^{-1} q,infx​(21​xTQx+qTx)=−21​qTQ−1q,$

    \nattained at x=−Q−1qx=-Q^{-1}qx=−Q−1q.

    Here q=(c+ATν)q=(c+A^T\nu)q=(c+ATν).

  7. Therefore:

    \n$$

    \begin{aligned}

    d(\nu)

    &= -\frac{1}{2}(c+A^T\nu)^T Q^{-1}(c+A^T\nu) - \nu^T b.

    \end{aligned}

  8. Write the dual problem:

    \n$max⁡ν∈Rp  −12(c+ATν)TQ−1(c+ATν)−νTb.\max_{\nu\in\mathbb{R}^p}; -\frac{1}{2}(c+A^T\nu)^T Q^{-1}(c+A^T\nu) - \nu^T b.maxν∈Rp​−21​(c+ATν)TQ−1(c+ATν)−νTb.$

  9. Optionally expand into a standard concave quadratic form in (\nu):

    \n$$

    \begin{aligned}

    (c+A^T\nu)^T Q^{-1}(c+A^T\nu)

    = c^TQ^{-1}c + 2\nu^T A Q^{-1}c + \nu^T A Q^{-1} A^T \nu.

    \end{aligned}

    So
    \n$$
    \begin{aligned}
    d(\nu)= -\frac{1}{2}\nu^T(AQ^{-1}A^T)\nu - \nu^T(AQ^{-1}c + b) - \frac{1}{2}c^TQ^{-1}c.
    \end{aligned}

Insight: This example shows the typical dual pattern: (1) build LLL, (2) compute ddd by eliminating xxx via an infimum, and (3) get a concave maximization over multipliers. The matrix AQ−1ATAQ^{-1}A^TAQ−1AT governs the dual curvature. When the primal is large but ppp (number of equalities) is small, the dual can be much cheaper.

One iteration of dual ascent for an inequality-constrained problem #

Primal:

minimizex∈R(x−3)2subject tox≤1\begin{aligned}
\text{minimize}_{x\in\mathbb{R}} \quad & (x-3)^2 \
\text{subject to} \quad & x \le 1
\end{aligned}minimizex∈R​subject to​(x−3)2x≤1​

Rewrite as g(x)=x−1≤0g(x)=x-1\le 0g(x)=x−1≤0. Lagrangian: L(x,λ)=(x−3)2+λ(x−1)L(x,\lambda)=(x-3)^2+\lambda(x-1)L(x,λ)=(x−3)2+λ(x−1), λ≥0\lambda\ge 0λ≥0.

We’ll run one dual-ascent step with step size α=0.5\alpha=0.5α=0.5, starting from λ0=0\lambda^0=0λ0=0.

  1. Given λ0=0\lambda^0=0λ0=0, minimize L(x,λ0)L(x,\lambda^0)L(x,λ0) over x:

    \n$L(x,0)=(x−3)2.L(x,0)=(x-3)^2.L(x,0)=(x−3)2.$

    Minimizer is x0=3x^0=3x0=3.

  2. Compute constraint residual at x0x^0x0:

    \n$g(x0)=x0−1=2>0,g(x^0)=x^0-1=2>0,g(x0)=x0−1=2>0,$

    so the constraint is violated.

  3. Update the multiplier using projected ascent:

    \n$λ1=max⁡{0,λ0+αg(x0)}=max⁡{0,0+0.5⋅2}=1.\lambda^{1}=\max{0,\lambda^0+\alpha g(x^0)}=\max{0,0+0.5\cdot 2}=1.λ1=max{0,λ0+αg(x0)}=max{0,0+0.5⋅2}=1.$

  4. Now (to see the effect), compute the next primal minimizer for λ1=1\lambda^1=1λ1=1:

    \nMinimize L(x,1)=(x−3)2+(x−1)L(x,1)=(x-3)^2 + (x-1)L(x,1)=(x−3)2+(x−1).

    Differentiate:

    \n$ddx((x−3)2+x−1)=2(x−3)+1=2x−5.\frac{d}{dx}\left((x-3)^2+x-1\right)=2(x-3)+1=2x-5.dxd​((x−3)2+x−1)=2(x−3)+1=2x−5.$

    Set to zero: $2x-5=0 \Rightarrow x^1=2.5$.

    Still infeasible, but closer to the constraint boundary x≤1x\le 1x≤1 than 3.

Insight: Dual ascent increases λ\lambdaλ when constraints are violated. That increase changes the unconstrained minimizer of LLL, pushing the next xxx toward feasibility. With proper stepsizes (and under convexity/regularity), iterating this can drive residuals down and converge to the constrained optimum.

Key Takeaways #

Common Mistakes #

Practice #

easy

Compute the dual function for the problem: minimize x2x^2x2 subject to x≤2x\le 2x≤2. Then find the dual optimum value and compare to the primal optimum.

Hint: Write the constraint as g(x)=x−2≤0g(x)=x-2\le 0g(x)=x−2≤0. Form L(x,λ)=x2+λ(x−2)L(x,\lambda)=x^2+\lambda(x-2)L(x,λ)=x2+λ(x−2) with λ≥0\lambda\ge 0λ≥0. Minimize the quadratic in xxx to get d(λ)d(\lambda)d(λ), then maximize over λ≥0\lambda\ge 0λ≥0.

Show solution

Constraint x≤2x\le 2x≤2 means g(x)=x−2≤0g(x)=x-2\le 0g(x)=x−2≤0.

Lagrangian: L(x,λ)=x2+λ(x−2)L(x,\lambda)=x^2+\lambda(x-2)L(x,λ)=x2+λ(x−2).

Dual function:

d(λ)=inf⁡x(x2+λx−2λ).d(\lambda)=\inf_x (x^2+\lambda x-2\lambda).d(λ)=xinf​(x2+λx−2λ).

Stationarity: $2x+\lambda=0 \Rightarrow x^*(\lambda)=-\lambda/2$.

Plug in:

d(λ)=(−λ2)2+λ(−λ2)−2λ=λ24−λ22−2λ=−λ24−2λ.\begin{aligned}
d(\lambda)
&= \left(-\frac{\lambda}{2}\right)^2 + \lambda\left(-\frac{\lambda}{2}\right) - 2\lambda \
&= \frac{\lambda^2}{4}-\frac{\lambda^2}{2}-2\lambda \
&= -\frac{\lambda^2}{4}-2\lambda.
\end{aligned}d(λ)​=(−2λ​)2+λ(−2λ​)−2λ=4λ2​−2λ2​−2λ=−4λ2​−2λ.​

Maximize over λ≥0\lambda\ge 0λ≥0. This concave parabola is decreasing for λ≥0\lambda\ge 0λ≥0 (derivative −λ/2−2<0-\lambda/2-2<0−λ/2−2<0), so maximum occurs at λ∗=0\lambda^*=0λ∗=0.

Thus d∗=d(0)=0d^*=d(0)=0d∗=d(0)=0.

Primal optimum: unconstrained minimizer x=0x=0x=0 is feasible (since $0\le 2),so), so ),sop^*=0$.

Hence $d^=p^=0 (\text{strong duality holds}).

medium

Slater check: Consider minimizing a convex function f(x)f(x)f(x) subject to the single constraint ∥x∥2≤1|x|_2 \le 1∥x∥2​≤1 (in Rn\mathbb{R}^nRn). Does Slater’s condition hold? What if the constraint is ∥x∥2≤0|x|_2 \le 0∥x∥2​≤0?

Hint: Slater requires a strictly feasible point for inequalities: find an xxx such that ∥x∥2<1|x|_2 < 1∥x∥2​<1 (or < 0).

Show solution

For ∥x∥2≤1|x|_2 \le 1∥x∥2​≤1, Slater holds because x=0x=0x=0 satisfies ∥0∥2=0<1|0|_2=0<1∥0∥2​=0<1.

For ∥x∥2≤0|x|_2 \le 0∥x∥2​≤0, the feasible set is only {0}{0}{0} and there is no xxx with ∥x∥2<0|x|_2<0∥x∥2​<0. So Slater fails.

hard

Dual ascent reasoning: For a problem with one inequality constraint g(x)≤0g(x)\le 0g(x)≤0, suppose at iteration k you find xk∈arg⁡min⁡xL(x,λk)x^k \in \arg\min_x L(x,\lambda^k)xk∈argminx​L(x,λk) and you observe g(xk)>0g(x^k)>0g(xk)>0. What does the update λk+1=max⁡{0,λk+αg(xk)}\lambda^{k+1}=\max{0,\lambda^k+\alpha g(x^k)}λk+1=max{0,λk+αg(xk)} do, and why is that direction sensible for maximizing d(λ)d(\lambda)d(λ)?

Hint: Interpret g(xk)g(x^k)g(xk) as a (sub)gradient of ddd at λk\lambda^kλk. Also interpret λ\lambdaλ as a penalty weight on violations.

Show solution

If g(xk)>0g(x^k)>0g(xk)>0, the constraint is violated at the current Lagrangian minimizer. The update increases λ\lambdaλ (since λk+αg(xk)>λk\lambda^k+\alpha g(x^k)>\lambda^kλk+αg(xk)>λk), making future minimizations of L(x,λ)L(x,\lambda)L(x,λ) penalize positive g(x)g(x)g(x) more heavily.

This is sensible because (under mild conditions) g(xk)g(x^k)g(xk) is a subgradient of the concave dual function d(λ)=inf⁡x(f(x)+λg(x))d(\lambda)=\inf_x \big(f(x)+\lambda g(x)\big)d(λ)=infx​(f(x)+λg(x)). For concave maximization, ascending along a subgradient increases ddd locally. The projection max⁡{0,⋅}\max{0,\cdot}max{0,⋅} preserves dual feasibility λ≥0\lambda\ge 0λ≥0, which is required for weak duality and the correctness of the dual problem.

Connections #

Prerequisites you already have: KKT Conditions, Lagrange Multipliers

Next nodes this supports (typical unlocks):

Quality: B (4.0/5)

← back to treebrowse all →