1: Introduction to Lagrange Multipliers
- Page ID
- 17316
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)To avoid repetition, it is to be understood throughout that \(f\) and \(g_{1}\), \(g_{2}\),…, \(g_{m}\) are continuously differentiable on an open set \(D\) in \(\mathbb{R}^{n}\).
Suppose that \(m<n\) and
\[\label{eq:1} g_{1}(\mathbf{X}) = g_2(\mathbf{X}) = \cdots = g_{m}(\mathbf{X})=0 \]
on a nonempty subset \(D_{1}\) of \(D\). If \(\mathbf{X}_{0} \in D_{1}\) and there is a neighborhood \(N\) of \(\mathbf{X}_{0}\) such that
\[\label{eq:2} f(\mathbf{X}) \le f(\mathbf{X}_{0}) \]
for every \(\mathbf{X}\) in \(N \cap D_{1}\), then \(\mathbf{X}_{0}\) is a local maximum point of \(f\) subject to the constraints Equation \ref{eq:1}. However, we will usually say “subject to” rather than “subject to the constraint(s).”
If Equation \ref{eq:2} is replaced by
\[\label{eq:3} f(\mathbf{X}) \ge f(\mathbf{X}_{0}), \]
then “maximum” is replaced by “minimum.” A local maximum or minimum of \(f\) subject to Equation \ref{eq:1} is also called a local extreme point of \(f\) subject to Equation \ref{eq:1}. More briefly, we also speak of constrained local maximum, minimum, or extreme points. If Equation \ref{eq:2} or Equation \ref{eq:3} holds for all \(\mathbf{X}\) in \(D_{1}\), we omit “local.”
Recall that \({\bf X}_{0}=(x_{10}, x_{20},\dots,x_{n0})\) is a critical point of a differentiable function \(L=L(x_{1},x_{2},\dots,x_{n})\) if
\[L_{x_{i}}(x_{10},x_{20},\dots,x_{n0})=0,\quad 1\le i\le n. \nonumber \]
Therefore, every local extreme point of \(L\) is a critical point of \(L\); however, a critical point of \(L\) is not necessarily a local extreme point of \(L\).
Suppose that the system Equation \ref{eq:1} of simultaneous equations can be solved for \(x_{1}\), …, \(x_{m}\) in terms of the \(x_{m+1}\), …, \(x_{n}\); thus,
\[\label{eq:4} x_{j}=h_{j}(x_{m+1},\dots,x_{n}),\quad 1\le j\le m. \]
Then a constrained extreme value of \(f\) is an unconstrained extreme value of
\[\label{eq:5} f(h_{1}(x_{m+1},\dots,x_{n}),\dots,h_{m}(x_{m+1},\dots,x_{n}),x_{m+1},\dots,x_{n}). \]
However, it may be difficult or impossible to find explicit formulas for \(h_{1}\), \(h_{2}\), …, \(h_{m}\), and, even if it is possible, the composite function Equation \ref{eq:5} is almost always complicated. Fortunately, there is a better way to to find constrained extrema, which also requires the solvability assumption, but does not require an explicit formula as indicated in Equation \ref{eq:4}. It is based on the following theorem. Since the proof is complicated, we consider two special cases first.
Suppose that \(n>m.\) If \({\bf X}_{0}\) is a local extreme point of \(f\) subject to
\[g_{1}({\bf X})=g_{2}({\bf X})=\cdots =g_{m}({\bf X})=0 \nonumber \]
and
\[\label{eq:6} \left|\begin{array}{ccccccc} \displaystyle{\frac{\partial{g_{1}(\mathbf{X}_{0})}}{\partial{x_{r_{1}}}}} & \displaystyle{\frac{\partial{g_{1}(\mathbf{X}_{0})}}{\partial{x_{r_{2}}}}}& &\cdots & \displaystyle{\frac{\partial{g_{1}(\mathbf{X}_{0})}}{\partial{x_{r_{m}}}}} \\\\ \displaystyle{\frac{\partial{g_{2}(\mathbf{X}_{0})}}{\partial{x_{r_{1}}}}} & \displaystyle{\frac{\partial{g_{2}(\mathbf{X}_{0})}}{\partial{x_{r_{2}}}}}& &\cdots & \displaystyle{\frac{\partial{g_{m}(\mathbf{X}_{0})}}{\partial{x_{r_{m}}}}} & \\ \vdots & \vdots &&\ddots&\vdots\\ \displaystyle{\frac{\partial{g_{m}(\mathbf{X}_{0})}}{\partial{x_{r_{1}}}}} & \displaystyle{\frac{\partial{g_{m}(\mathbf{X}_{0})}}{\partial{x_{r_{2}}}}}& &\cdots & \displaystyle{\frac{\partial{g_{m}(\mathbf{X}_{0})}}{\partial{x_{r_{m}}}}} & \end{array}\right|\ne0 \]
for at least one choice of \(r_{1}<r_{2}<\dots <r_{m}\) in \(\{1,2,\dots,n\},\) then there are constants \(\lambda_{1},\) \(\lambda_{2},\) …\(,\) \(\lambda_{m}\) such that \({\bf X}_{0}\) is a critical point of
\[f-\lambda_{1}g_{1}-\lambda_{2}g_{2}-\cdots-\lambda_{m} g_{m}; \nonumber \]
that is\(,\)
\[\frac{\partial{f({\bf X}_{0})}}{\partial x_{i}} -\lambda_{1}\frac{\partial{g_{1}({\bf X}_{0})}}{\partial x_{i}} -\lambda_{2}\frac{\partial{g_{2}({\bf X}_{0})}}{\partial x_{i}}-\cdots -\lambda_{m}\frac{\partial{g_{m}({\bf X}_{0})}}{\partial x_{i}}=0, \nonumber \]
\(1\le i\le n\).
The following implementation of this theorem is the method of Lagrange multipliers.
- Find the critical points of \[f-\lambda_{1}g_{1}-\lambda_{2}g_{2}-\cdots-\lambda_{m} g_{m}, \nonumber \] treating \(\lambda_{1}\), \(\lambda_{2}\), …\(\lambda_{m}\) as unspecified constants.
- Find \(\lambda_{1}\), \(\lambda_{2}\), …, \(\lambda_{m}\) so that the critical points obtained in (a) satisfy the constraints.
- Determine which of the critical points are constrained extreme points of \(f\). This can usually be done by physical or intuitive arguments.
- If \(a\) and \(b_{1}\), \(b_{2}\), …, \(b_{m}\) are nonzero constants and \(c\) is an arbitrary constant, then the local extreme points of \(f\) subject to \(g_{1}=g_{2}= \cdots =g_{m}=0\) are the same as the local extreme points of \(af-c\) subject to \(b_{1}g_{1}=b_{2}g_{2}=\cdots=b_{m}g_{m}=0\). Therefore, we can replace \(f-\lambda_{1} g_{1}-\lambda_{2}g_{2}- \cdots-\lambda_{m} g_{m}\) by \(af-\lambda_{1}b_{1}g_{1}-\lambda_{2}b_{2}g_{2}- \cdots- \lambda_{m}b_{m}g_{m}-c\) to simplify computations. (Usually, the “\(-c\)” indicates dropping additive constants.) We will denote the final form by \(L\) (for Lagrangian).
[theorem:1]