6.1: Directional and Partial Derivatives
- Page ID
- 19197
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)In Chapter 5 we considered functions \(f : E^{1} \rightarrow E\) of one real variable.
Now we take up functions \(f : E^{\prime} \rightarrow E\) where both \(E^{\prime}\) and \(E\) are normed spaces.
The scalar field of both is always assumed the same: \(E^{1}\) or \(C\) (the complex field). The case \(E=E^{*}\) is excluded here; thus all is assumed finite.
We mostly use arrowed letters \(\vec{p}, \vec{q}, \ldots, \vec{x}, \vec{y}, \vec{z}\) for vectors in the domain space \(E^{\prime},\) and nonarrowed letters for those in \(E\) and for scalars.
As before, we adopt the convention that \(f\) is defined on all of \(E^{\prime},\) with \(f(\vec{x})=0\) if not defined otherwise.
Note that, if \(\vec{p} \in E^{\prime},\) one can express any point \(\vec{x} \in E^{\prime}\) as
\[\vec{x}=\vec{p}+t \vec{u},\]
with \(t \in E^{1}\) and \(\vec{u}\) a unit vector. For if \(\vec{x} \neq \vec{p},\) set
\[t=|\vec{x}-\vec{p}| \text { and } \vec{u}=\frac{1}{t}(\vec{x}-\vec{p});\]
and if \(\vec{x}=\vec{p},\) set \(t=0,\) and any\(\vec{u}\) will do. We often use the notation
\[\vec{t}=\Delta \vec{x}=\vec{x}-\vec{p}=t \vec{u} \quad\left(t \in E^{1}, \vec{t}, \vec{u} \in E^{\prime}\right).\]
First of all, we generalize Definition 1 in Chapter 5, §1.
Given \(f : E^{\prime} \rightarrow E\) and \(\vec{p}, \vec{u} \in E^{\prime}(\vec{u} \neq \overrightarrow{0}),\) we define the directional derivative of \(f\) along \(\vec{u}\) (or \(\vec{u}\) -directed derivative of \(f )\) at \(\vec{p}\) by
\[D_{\vec{u}} f(\vec{p})=\lim _{t \rightarrow 0} \frac{1}{t}[f(\vec{p}+t \vec{u})-f(\vec{p})],\]
if this limit exists in \(E\) (finite).
We also define the \(\vec{u}\) -directed derived function,
\[D_{\vec{u}} f : E^{\prime} \rightarrow E,\]
as follows. For any \(\vec{p} \in E^{\prime}\),
\[D_{\vec{u} f(\vec{p})}=\left\{\begin{array}{ll}{\lim _{t \rightarrow 0} \frac{1}{t}[f(\vec{p}+t \vec{u})-f(\vec{p})]} & {\text { if this limit exists, }} \\ {0} & {\text { otherwise. }}\end{array}\right.\]
Thus \(D_{\vec{u}} f\) is always defined, but the name derivative is used only if the limit (1) exists (finite). If it exists for each \(\vec{p}\) in a set \(B \subseteq E^{\prime},\) we call \(D_{\vec{u}} f\) (in classical notation \(\partial f / \partial \vec{u} )\) the \(\vec{u}\) -directed derivative of \(f\) on \(B\).
Note that, as \(t \rightarrow 0, \vec{x}\) tends to \(\vec{p}\) over the line \(\vec{x}=\vec{p}+t \vec{u}.\) Thus \(D_{\vec{u}} f(\vec{p})\) can be treated as a relative limit over that line. Observe that it depends on both the direction and the length of \(\vec{u}.\) Indeed, we have the following result.
Given \(f : E^{\prime} \rightarrow E, \vec{u} \neq \overrightarrow{0},\) and a scalar \(s \neq 0,\) we have
\[D_{s \vec{u}} f=s D_{\vec{u}} f.\]
Moreover, \(D_{s \vec{u}} f(\vec{p})\) is a genuine derivative iff \(D_{\vec{u}} f(\vec{p})\) is.
- Proof
-
Set \(t=s \theta\) in (1) to get
\[s D_{\vec{u}} f(\vec{p})=\lim _{\theta \rightarrow 0} \frac{1}{\theta}[f(\vec{p}+\theta s \vec{u})-f(\vec{p})]=D_{s \vec{u}} f(\vec{p}). \quad \square\]
In particular, taking \(s=1 /|\vec{u}|,\) we have
\[|s \vec{u}|=\frac{|\vec{u}|}{|\vec{u}|}=1 \text { and } D_{\vec{u}} f=\frac{1}{s} D_{s \vec{u}} f.\]
Thus all reduces to the case \(D_{\vec{v}} f,\) where \(\vec{v}=s \vec{u}\) is a unit vector. This device, called normalization, is often used, but actually it does not simplify matters.
If \(E^{\prime}=E^{n}\left(C^{n}\right),\) then \(f\) is a function of \(n\) scalar variables \(x_{k}(k=1, \ldots, n)\) and \(E^{\prime}\) has the \(n\) basic unit vectors \(\vec{e}_{k}.\) This example leads us to the following definition.
If in formula (1), \(E^{\prime}=E^{n}\left(C^{n}\right)\) and \(\vec{u}=\vec{e}_{k}\) for a fixed \(k \leq n,\) we call \(D_{\vec{u}} f\) the partially derived function for \(f,\) with respect to \(x_{k},\) denoted
\[D_{k} f \text { or } \frac{\partial f}{\partial x_{k}},\]
and the limit (1) is called the partial derivative of \(f\) at \(\vec{p},\) with respect to \(x_{k},\) denoted
\[D_{k} f(\vec{p}), \text { or } \frac{\partial}{\partial x_{k}} f(\vec{p}), \text { or }\left.\frac{\partial f}{\partial x_{k}}\right|_{\vec{x}=\vec{p}}.\]
If it exists for all \(\vec{p} \in B,\) we call \(D_{k} f\) the partial derivative (briefly, partial) of \(f\) on \(B,\) with respect to \(x_{k}\).
In any case, the derived functions \(D_{k} f(k=1, \ldots, n)\) are always defined on all of \(E^{n}\left(C^{n}\right).\)
If \(E^{\prime}=E^{3}\left(C^{3}\right),\) we often write \(x, y, z\) for \(x_{1}, x_{2}, x_{3},\) and
\[\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \text { for } D_{k} f \quad(k=1,2,3).\]
Note 1. If \(E^{\prime}=E^{1},\) scalars are also "vectors," and \(D_{1} f\) coincides with \(f^{\prime}\) as defined in Chapter 5, §1 (except where \(f^{\prime}=\pm \infty\)). Explain!
Note 2. As we have observed, the \(\vec{u}\) -directed derivative (1) is obtained by keeping \(\vec{x}\) on the line \(\vec{x}=\vec{p}+t \vec{u}.\)
If \(\vec{u}=\vec{e}_{k},\) the line is parallel to the \(k\)th axis; so all coordinates of \(\vec{x},\) except \(x_{k},\) remain fixed \(\left(x_{i}=p_{i}, i \neq k\right),\) and \(f\) behaves like a function of one variable, \(x_{k}.\) Thus we can compute \(D_{k} f\) by the usual rules of differentiation, treating all \(x_{i}(i \neq k)\) as constants and \(x_{k}\) as the only variable.
For example, let \(f(x, y)=x^{2} y.\) Then
\[\frac{\partial f}{\partial x}=D_{1} f(x, y)=2 x y \text { and } \frac{\partial f}{\partial y}=D_{2} f(x, y)=x^{2}.\]
Note 3. More generally, given \(\vec{p}\) and \(\vec{u} \neq \overrightarrow{0},\) set
\[h(t)=f(\vec{p}+t \vec{u}), \quad t \in E^{1}.\]
Then \(h(0)=f(\vec{p});\) so
\[\begin{aligned} D_{\vec{u}} f(\vec{p}) &=\lim _{t \rightarrow 0} \frac{1}{t}[f(\vec{p}+t \vec{u})-f(\vec{p})] \\ &=\lim _{t \rightarrow 0} \frac{h(t)-h(0)}{t-0} \\ &=h^{\prime}(0) \end{aligned}\]
if the limit exists. Thus all reduces to a function \(h\) of one real variable.
For functions \(f : E^{1} \rightarrow E,\) the existence of a finite derivative ("differentiability") at \(p\) implies continuity at \(p\) (Theorem 1 of Chapter 5, §1). But in the general case, \(f : E^{\prime} \rightarrow E,\) this may fail even if \(D_{\vec{u}} f(\vec{p})\) exists for all \(\vec{u} \neq \overrightarrow{0}\).
(a) Define \(f : E^{2} \rightarrow E^{1}\) by
\[f(x, y)=\frac{x^{2} y}{x^{4}+y^{2}}, \quad f(0,0)=0.\]
Fix a unit vector \(\vec{u}=\left(u_{1}, u_{2}\right)\) in \(E^{2}.\) Let \(\vec{p}=(0,0).\) To find \(D_{\vec{u}} f(p),\) use the \(h\) of Note 3 :
\[h(t)=f(\vec{p}+t \vec{u})=f(t \vec{u})=f\left(t u_{1}, t u_{2}\right)=\frac{t u_{1}^{2} u_{2}}{t^{2} u_{1}^{4}+u_{2}^{2}} \text { if } u_{2} \neq 0,\]
and \(h=0\) if \(u_{2}=0.\) Hence
\[D_{\vec{u}} f(\vec{p})=h^{\prime}(0)=\frac{u_{1}^{2}}{u_{2}} \text { if } u_{2} \neq 0,\]
and \(h^{\prime}(0)=0\) if \(u_{2}=0.\) Thus \(D_{\vec{u}}(\overrightarrow{0})\) exists for all \(\vec{u}.\) Yet \(f\) is discontinuous at \(\overrightarrow{0}\) (see Problem 9 in Chapter 4, §3).
(b) Let
\[f(x, y)=\left\{\begin{array}{ll}{x+y} & {\text { if } x y=0,} \\ {1} & {\text { otherwise.}}\end{array}\right.\]
Then \(f(x, y)=x\) on the \(x\)-axis; so \(D_{1} f(0,0)=1\).
Yet \(f\) is discontinuous at \(\overrightarrow{0}\) (even relatively so) over any line \(y=a x\) \((a \neq 0).\) For on that line, \(f(x, y)=1\) if \((x, y) \neq(0,0);\) so \(f(x, y) \rightarrow 1\) but \(f(0,0)=0+0=0\).
Thus continuity at \(\overrightarrow{0}\) fails. (But see Theorem 1 below!)
Hence, if differentiability is to imply continuity, it must be defined in a stronger manner. We do it in §3. For now, we prove only some theorems on partial and directional derivatives, based on those of Chapter 5.
If \(f : E^{\prime} \rightarrow E\) has a \(\vec{u}\)-directed derivative at \(\vec{p} \in E^{\prime},\) then \(f\) is relatively continuous at \(\vec{p}\) over the line
\[\vec{x}=\vec{p}+t \vec{u} \quad\left(\overrightarrow{0} \neq \vec{u} \in E^{\prime}\right).\]
- Proof
-
Set \(h(t)=f(\vec{p}+t \vec{u}), t \in E^{1}\).
By Note 3, our assumption implies that \(h\) (a function on \(E^{1}\)) is differentiable at \(0.\)
By Theorem 1 in Chapter 5, §1, then, \(h\) is continuous at \(0;\) so
\[\lim _{t \rightarrow 0} h(t)=h(0)=f(\vec{p}),\]
i.e.,
\[\lim _{t \rightarrow 0} f(\vec{p}+t \vec{u})=f(\vec{p}).\]
But this means that \(f(\vec{x}) \rightarrow f(\vec{p})\) as \(\vec{x} \rightarrow \vec{p}\) over the line \(\vec{x}=\vec{p}+t \vec{u},\) for, on that line, \(\vec{x}=\vec{p}+t \vec{u}.\)
Thus, indeed, \(f\) is relatively continuous at \(\vec{p},\) as stated. \(\quad \square\)
Note that we actually used the substitution \(\vec{x}=\vec{p}+t \vec{u}.\) This is admissible since the dependence between \(x\) and \(t\) is one-to-tone (Corollary 2(iii) of Chapter 4, §2). Why?
Let \(E^{\prime} \ni \vec{u}=\vec{q}-\vec{p}, \vec{u} \neq \overrightarrow{0}\).
If \(f : E^{\prime} \rightarrow E\) is relatively continuous on the segment \(I=L[\vec{p}, \vec{q}]\) and has a \(\vec{u}\)-directed derivative on \(I-Q\) (\(Q\) countable), then
\[|f(\vec{q})-f(\vec{p})| \leq \sup \left|D_{\vec{u}} f(\vec{x})\right|, \quad \vec{x} \in I-Q.\]
- Proof
-
Set again \(h(t)=f(\vec{p}+t \vec{u})\) and \(g(t)=\vec{p}+t \vec{u}\).
Then \(h=f \circ g,\) and \(g\) is continuous on \(E^{1}.\) (Why?)
As \(f\) is relatively continuous on \(I=L[\vec{p}, \vec{q}],\) so is \(h=f \circ g\) on the interval \(J=[0,1] \subset E^{1}\) (cf. Chapter 4, §8, Example (1)).
Now fix \(t_{0} \in J.\) If \(\vec{x}_{0}=\vec{p}+t_{0} \vec{u} \in I-Q,\) our assumptions imply the existence of
\[\begin{aligned} D_{\vec{u}} f\left(\vec{x}_{0}\right) &=\lim _{t \rightarrow 0} \frac{1}{t}\left[f\left(\vec{x}_{0}+t \vec{u}\right)-f\left(\vec{x}_{0}\right)\right] \\ &=\lim _{t \rightarrow 0} \frac{1}{t}\left[f\left(\vec{p}+t_{0} \vec{u}+t \vec{u}\right)-f\left(\vec{p}+t_{0} \vec{u}\right)\right] \\ &=\lim _{t \rightarrow 0} \frac{1}{t}\left[h\left(t_{0}+t\right)-h\left(t_{0}\right)\right] \\ &=h^{\prime}\left(t_{0}\right) . \quad \text {(Explain!)} \end{aligned}\]
This can fail for at most a countable set \(Q^{\prime}\) of points \(t_{0} \in J\) (those for which \(\vec{x}_{0} \in Q).\)
Thus \(h\) is differentiable on \(J-Q^{\prime};\) and so, by Corollary 1 in Chapter 5, §4,
\[|h(1)-h(0)| \leq \sup _{t \in J-Q^{\prime}}\left|h^{\prime}(t)\right|=\sup _{\vec{x} \in I-Q}\left|D_{\vec{u}} f(\vec{x})\right|.\]
Now as \(h(1)=f(\vec{p}+\vec{u})=f(\vec{q})\) and \(h(\overrightarrow{0})=f(\vec{p}),\) formula (2) follows. \(\quad \square\)
If in Theorem 2, \(E=E^{1}\) and if \(f\) has a \(\vec{u}\)-directed derivative at least on the open line segment \(L(\vec{p}, \vec{q}),\) then
\[f(\vec{q})-f(\vec{p})=D_{\vec{u}} f\left(\vec{x}_{0}\right)\]
for some \(\vec{x}_{0} \in L(\vec{p}, \vec{q})\).
- Proof
-
The proof is as in Theorem 2, based on Corollary 3 in Chapter 5, §2 (instead of Corollary 1 in Chapter 5, §4).
Theorems 2 and 3 are often used in "normalized" form, as follows.
If in Theorems 2 and 3, we set
\[r=|\vec{u}|=|\vec{q}-\vec{p}| \neq 0 \text { and } \vec{v}=\frac{1}{r} \vec{u},\]
then formulas (2) and (3) can be written as
\[|f(\vec{q})-f(\vec{p})| \leq|\vec{q}-\vec{p}| \sup \left|D_{\vec{v}} f(\vec{x})\right|, \quad \vec{x} \in I-Q,\]
and
\[f(\vec{q})-f(\vec{p})=|\vec{q}-\vec{p}| D_{\vec{v}} f\left(\vec{x}_{0}\right)\]
for some \(\vec{x}_{0} \in L(\vec{p}, \vec{q})\).
For by Corollary 1,
\[D_{\vec{u}} f=r D_{\vec{v}} f=|\vec{q}-\vec{p}| D_{\vec{v}} f;\]
so (2') and (3') follow.