6.3: Differentiable Functions
( \newcommand{\kernel}{\mathrm{null}\,}\)
As we know, a function f:E1→E(on E1) is differentiable at p∈E1 iff, with Δf=f(x)−f(p) and Δx=x−p,
f′(p)=limx→pΔfΔx exists ( finite ).
Setting Δx=x−p=t,Δf=f(p+t)−f(p), and f′(p)=v, we may write this equation as
limt→0|Δft−v|=0,
or
limt→01|t||f(p+t)−f(p)−vt|=0
Now define a map ϕ:E1→E by ϕ(t)=tv,v=f′(p)∈E.
Then ϕ is linear and continuous, i.e., ϕ∈L(E1,E); so by Corollary 2 in §2, we may express (1) as follows: there is a map ϕ∈L(E1,E) such that
limt→01|t||Δf−ϕ(t)|=0.
We adopt this as a definition in the general case, f:E′→E, as well.
A function f:E′→E where E′ and E are normed spaces over the same scalar field) is said to be differentiable at a point →p∈E′ iff there is a map
ϕ∈L(E′,E)
such that
lim→t→→01|→t||Δf−ϕ(→t)|=0;
that is,
lim→t→→01|→t|[f(→p+→t)−f(→p)−ϕ(→t)]=0.
As we show below, ϕ is unique (for a fixed →p), if it exists.
We call ϕ the differential of f at →p, briefly denoted df. As it depends on →p, we also write df(→p;→t) for df(→t) and df(→p;⋅) for df.
Some authors write f′(→p) for df(→p;⋅) and call it the derivative at →p, but we shall not do this (see Preface). Following M. Spivak, however, we shall use "[f′(→p)]" for its matrix, as follows.
If E′=En(Cn) and E=Em(Cm), and f:E′→E is differentiable at →p, we set
[f′(→p)]=[df(→p;⋅)]
and call it the Jacobian matrix of f at →p.
Note 1. In Chapter 5, §6, we did not define df as a mapping. However, if E′=E1, the function value
df(p;t)=vt=f′(p)Δx
is as in Chapter 5, §6.
Also, [f′(p)] is a 1×1 matrix with single term f′(p). (Why?) This motivated Definition 2.
(uniqueness of df). If f:E′→E is differentiable at →p, then the map ϕ described in Definition 1 is unique (dependent on f and →p only).
- Proof
-
Suppose there is another linear map g:E′→E such that
lim→t→→01|→t|[f(→p+→t)−f(→p)−g(→t)]=lim→t→→01|→t|[Δf−g(→t)]=0.
Let h=ϕ−g. By Corollary 1 in §2, h is linear.
Also, by the triangle law,
|h(→t)|=|ϕ(→t)−g(→t)|≤|Δf−ϕ(→t)|+|Δf−g(→t)|.
Hence, dividing by |→t|,
|h(→t|→t|)|=1|→t||h(→t)|≤1|→t||Δf−ϕ(→t)|+1|→t||Δf−g(→t)|.
By (3) and (2), the right side expressions tend to 0 as →t→→0. Thus
lim→t→→0h(→t|→t|)=0.
This remains valid also if →t→→0 over any line through →0, so that →t/|→t| remains constant, say →t/|→t|=→u, where →u is an arbitrary (but fixed) unit vector.
Then
h(→t|→t|)=h(→u)
is constant; so it can tend to 0 only if it equals 0, so h(→u)=0 for any unit vector →u.
Since any →x∈E′ can be written as →x=|→x|→u, linearity yields
h(→x)=|→x|h(→u)=0.
Thus h=ϕ−g=0 on E′, and so ϕ=g after all, proving the uniqueness of ϕ.◻
If f is differentiable at →p, then
(i) f is continuous at →p;
(ii) for any →u≠→0, has the →u-directed derivative
D→uf(→p)=df(→p;→u).
- Proof
-
By assumption, formula (2) holds for ϕ=df(→p;⋅).
Thus, given ε>0, there is δ>0 such that, setting Δf=f(→p+→t)−f(→p) we have
1|→t||Δf−ϕ(→t)|<ε whenever 0<|→t|<δ;
or, by the triangle law,
|Δf|≤|Δf−ϕ(→t)|+|ϕ(→t)|≤ε|→t|+|ϕ(→t)|,0<|→t|<δ.
Now, by Definition 1,ϕ is linear and continuous; so
lim→t→→0|ϕ(→t)|=|ϕ(→0)|=0.
Thus, making →t→→0 in (5), with ε fixed, we get
lim→t→→0|Δf|=0.
As →t is just another notation for Δ→x=→x−→p, this proves assertion (i).
Next, fix any →u≠→0 in E′, and substitute t→u for →t in (4).
In other words, t is a real variable, 0<t<δ/|→u|, so that →t=t→u satisfies 0<|→t|<δ.
Multiplying by |→u|, we use the linearity of ϕ to get
ε|→u|>|Δft−ϕ(t→u)t|=|Δft−ϕ(→u)|=|f(→p+t→u)−f(→p)t−ϕ(→u)|.
As ε is arbitrary, we have
ϕ(→u)=limt→01t[f(→p+t→u)−f(→p)].
But this is simply D→uf(→p), by Definition 1 in §1.
Thus D→uf(→p)=ϕ(→u)=df(→p;→u), proving (ii).◻
Note 2. If E′=En(Cn), Theorem 2(iii) shows that if f is differentiable at →p, it has the n partials
Dkf(→p)=df(→p;→ek),k=1,…,n.
But the converse fails: the existence of the Dkf(→p) does not even imply continuity, let alone differentiability (see §1). Moreover, we have the following result.
If E′=En(Cn) and if f:E′→E is differentiable at →p, then
df(→p;→t)=n∑k=1tkDkf(→p)=n∑k=1tk∂∂xkf(→p),
where →t=(t1,…,tn).
Proof
-
By definition, ϕ=df(→p;⋅) is a linear map for a fixed →p.
If E′=En or Cn, we may use formula (3) of §2, replacing f and →x by ϕ and →t, and get
ϕ(→t)=df(→p;→t)=n∑k=1tkdf(→p;→ek)=n∑k=1tkDkf(→p)
by Note 2. ◻
Note 3. In classical notation, one writes Δxk or dxk for tk in (6). Thus, omitting →p and →t, formula (6) is often written as
df=∂f∂x1dx1+∂f∂x2dx2+⋯+∂f∂xndxn.
In particular, if n=3, we write x,y,z for x1,x2,x3. This yields
df=∂f∂xdx+∂f∂ydy+∂f∂zdz
(a familiar calculus formula).
Note 4. If the range space E in Corollary 1 is E1(C), then the Dkf(→p) form an n-tuple of scalars, i.e., a vector in En(Cn).
In case f:En→E1, we denote it by
∇f(→p)=(D1f(→p),…,Dnf(→p))=n∑k=1→ekDkf(→p).
In case f:Cn→C, we replace the Dkf(→p) by their conjugates ¯Dkf(→p) and set
∇f(→p)=n∑k=1→ek¯Dkf(→p).
The vector ∇f(→p) is called the gradient of f ("grad f") at →p.
From (6) we obtain
df(→p;→t)=n∑k=1tkDkf(→p)=→t⋅∇f(→p)
(dot product of →t by ∇f(→p)), provided f:En→E1 (or f:Cn→C) is differentiable at →p.
This leads us to the following result.
A function f:En→E1 (or f:Cn→C) is differentiable at →p iff
lim→t→¯01|→t||f(→p+→t)−f(→p)−→t⋅→v|=0
for some →v∈En(Cn).
In this case, necessarily →v=∇f(→p) and →t⋅→v=df(→p;→t),→t∈En(Cn).
- Proof
-
If f is differentiable at →p, we may set ϕ=df(→p;⋅) and →v=∇f(→p)
Then by (7),
ϕ(→t)=df(→p;→t)=→t⋅→v;
so by Definition 1, (8) results.
Conversely, if some →v satisfies (8), set ϕ(→t)=→t⋅→v. Then (8) implies (2), and ϕ is linear and continuous.
Thus by definition, f is differentiable at →p; so (7) holds.
Also, ϕ is a linear functional on En(Cn). By Theorem 2(ii) in §2, the →v in ϕ(→t)=→t⋅→v is unique, as is ϕ.
Thus by (7), →v=∇f(→p) necessarily. ◻
If f:En→E1 (real) is relatively continuous on a closed segment L[→p,→q],→p≠→q, and differentiable on L(→p,→q), then
f(→q)−f(→p)=(→q−→p)⋅∇f(→x0)
for some →x0∈L(→p,→q).
- Proof
-
Let
r=|→q−→p|,→v=1r(→q−→p), and r→v=(→q−→p).
By (7) and Theorem 2(ii),
D→vf(→x)=df(→x;→v)=→v⋅∇f(→x)
for →x∈L(→p,→q). Thus by formula (3') of Corollary 2 in §1,
f(→q)−f(→p)=rD→vf(→x0)=r→v⋅∇f(→x0)=(→q−→p)⋅∇f(→x0)
for some →x0∈L(→p,→q).◻
As we know, the mere existence of partials does not imply differentiability. But the existence of continuous partials does. Indeed, we have the following theorem.
Let E′=En(Cn).
If f:E′→E has the partial derivatives Dkf(k=1,…,n) on all of an open set A⊆E′, and if the Dkf are continuous at some →p∈A, then f is differentiable at →p.
- Proof
-
With →p as above, let
ϕ(→t)=n∑k=1tkDkf(→p) with →t=n∑k=1tk→ek∈E′.
Then ϕ is continuous (a polynomial!) and linear (Corollary 2 in §2).
Thus by Definition 1, it remains to show that
lim→t→→0|→t||Δf−ϕ(→t)|=0;
that is;
lim→t∈→01|→t||f(→p+→t)−f(→p)−n∑k=1tkDkf(→p)|=0.
To do this, fix ε>0. As A is open and the Dkf are continuous at →p∈A there is a δ>0 such that G→p(δ)⊆A and simultaneously (explain this!)
(∀→x∈G→p(δ))|Dkf(→x)−Dkf(→p)|<εn,k=1,…,n.
Hence for any set I⊆G→p(δ)
sup→x∈I|Dkf(→x)−Dkf(→p)|≤εn.(Why?)
Now fix any →t∈E′,0<|→t|<δ, and let →p0=→p,
→pk=→p+k∑i=1tiei,k=1,…,n.
Then
→pn=→p+n∑i=1ti→ei=→p+→t,
\left|\vec{p}_{k}-\vec{p}_{k-1}\right|=\left|t_{k}\right|, and all \vec{p}_{k} lie in G_{\vec{p}}(\delta), for
\left|\vec{p}_{k}-\vec{p}\right|=\left|\sum_{i=1}^{k} t_{i} e_{i}\right|=\sqrt{\sum_{i=1}^{k}\left|t_{i}\right|^{2}} \leq \sqrt{\sum_{i=1}^{n}\left|t_{i}\right|^{2}}=|\vec{t}|<\delta,
as required.
As G_{p}(\delta) is convex (Chapter 4, §9), the segments I_{k}=L\left[\vec{p}_{k-1}, \vec{p}_{k}\right] all lie in G_{\vec{p}}(\delta) \subseteq A; and by assumption, f has all partials there.
Hence by Theorem 1 in §1, f is relatively continuous on all I_{k}.
All this also applies to the functions g_{k}, defined by
\left(\forall \vec{x} \in E^{\prime}\right) \quad g_{k}(\vec{x})=f(\vec{x})-x_{k} D_{k} f(\vec{p}), \quad k=1, \ldots, n.
(Why?) Here
D_{k} g_{k}(\vec{x})=D_{k} f(\vec{x})-D_{k} f(\vec{p}).
(Why?)
Thus by Corollary 2 in §1, and (11) above,
\begin{aligned}\left|g_{k}\left(\vec{p}_{k}\right)-g_{k}\left(\vec{p}_{k-1}\right)\right| & \leq\left|\vec{p}_{k}-\vec{p}_{k-1}\right| \sup _{x \in I_{k}}\left|D_{k} f(\vec{x})-D_{k} f(\vec{p})\right| \\ & \leq \frac{\varepsilon}{n}\left|t_{k}\right| \leq \frac{\varepsilon}{n}|\vec{t}|, \end{aligned}
since
\left|\vec{p}_{k}-\vec{p}_{k-1}\right|=\left|t_{k} \vec{e}_{k}\right| \leq|\vec{t}|,
by construction.
Combine with (12), recalling that the kth coordinates x_{k}, for \vec{p}_{k} and \vec{p}_{k-1} differ by t_{k}; so we obtain
\begin{aligned}\left|g_{k}\left(\vec{p}_{k}\right)-g_{k}\left(\vec{p}_{k-1}\right)\right| &=\left|f\left(\vec{p}_{k}\right)-f\left(\vec{p}_{k-1}\right)-t_{k} D_{k} f(\vec{p})\right| \\ & \leq \frac{\varepsilon}{n}|\vec{t}|. \end{aligned}
Also,
\begin{aligned} \sum_{k=1}^{n}\left[f\left(\vec{p}_{k}\right)-f\left(\vec{p}_{k-1}\right)\right] &=f\left(\vec{p}_{n}\right)-f\left(\vec{p}_{0}\right) \\ &=f(\vec{p}+\vec{t})-f(\vec{p})=\Delta f(\text {see above}). \end{aligned}
Thus,
\begin{aligned}\left|\Delta f-\sum_{k=1}^{n} t_{k} D_{k} f(\vec{p})\right| &=\left|\sum_{k=1}^{n}\left[f\left(\vec{p}_{k}\right)-f\left(\vec{p}_{k-1}\right)-t_{k} D_{k} f(\vec{p})\right]\right| \\ & \leq n \cdot \frac{\varepsilon}{n}|\vec{t}|=\varepsilon|\vec{t}|. \end{aligned}
As \varepsilon is arbitrary, (10) follows, and all is proved. \quad \square
If f : E^{n} \rightarrow E^{m} (or f : C^{n} \rightarrow C^{m}) is differentiable at \vec{p}, with f=\left(f_{1}, \ldots, f_{m}\right), then \left[f^{\prime}(\vec{p})\right] is an m \times n matrix,
\left[f^{\prime}(\vec{p})\right]=\left[D_{k} f_{i}(\vec{p})\right], \quad i=1, \ldots, m, k=1, \ldots, n.
- Proof
-
By definition, \left[f^{\prime}(\vec{p})\right] is the matrix of the linear map \phi=d f(\vec{p} ; \cdot), \phi=\left(\phi_{1}, \ldots, \phi_{m}\right). Here
\phi(\vec{t})=\sum_{k=1}^{n} t_{k} D_{k} f(\vec{p})
by Corollary 1.
As f=\left(f_{1}, \ldots, f_{m}\right), we can compute D_{k} f(\vec{p}) componentwise by Theorem 5 of Chapter 5, §1, and Note 2 in §1 to get
\begin{aligned} D_{k} f(\vec{p}) &=\left(D_{k} f_{1}(\vec{p}), \ldots, D_{k} f_{m}(\vec{p})\right) \\ &=\sum_{i=1}^{m} e_{i}^{\prime} D_{k} f_{i}(\vec{p}), \quad k=1,2, \ldots, n, \end{aligned}
where the e_{i}^{\prime} are the basic vectors in E^{m}\left(C^{m}\right). (Recall that the \vec{e}_{k} are the basic vectors in E^{n}\left(C^{n}\right).)
Thus
\phi(\vec{t})=\sum_{i=1}^{m} e_{i}^{\prime} \phi_{i}(\vec{t}).
Also,
\phi(\vec{t})=\sum_{k=1}^{n} t_{k} \sum_{i=1}^{m} e_{i}^{\prime} D_{k} f_{i}(\vec{p})=\sum_{i=1}^{m} e_{i}^{\prime} \sum_{k=1}^{n} t_{k} D_{k} f_{i}(\vec{p}).
The uniqueness of the decomposition (Theorem 2 in Chapter 3, §§1-3) now yields
\phi_{i}(\vec{t})=\sum_{k=1}^{n} t_{k} D_{k} f_{i}(\vec{p}), \quad i=1, \ldots, m, \quad \vec{t} \in E^{n}\left(C^{n}\right).
If here \vec{t}=\vec{e}_{k}, then t_{k}=1, while t_{j}=0 for j \neq k. Thus we obtain
\phi_{i}\left(\vec{e}_{k}\right)=D_{k} f_{i}(\vec{p}), \quad i=1, \ldots, m, k=1, \ldots, n.
Hence,
\phi\left(\vec{e}_{k}\right)=\left(v_{1 k}, v_{2 k}, \ldots, v_{m k}\right),
where
v_{i k}=\phi_{i}\left(\vec{e}_{k}\right)=D_{k} f_{i}(\vec{p}).
But by Note 3 of §2, v_{1 k}, \ldots, v_{m k} (written vertically) is the kth column of the m \times n matrix [\phi]=\left[f^{\prime}(\vec{p})\right]. Thus formula (14) results indeed. \quad \square
In conclusion, let us stress again that while D_{\vec{u}} f(\vec{p}) is a constant, for a fixed \vec{p}, d f(\vec{p} ; \cdot) is a mapping
\phi \in L\left(E^{\prime}, E\right),
especially "tailored" for \vec{p}.
The reader should carefully study at least the "arrowed" problems below.