2.4: Directional Derivatives and the Gradient
( \newcommand{\kernel}{\mathrm{null}\,}\)
For a function z=f(x,y), we learned that the partial derivatives ∂f∂x and∂f∂y represent the (instantaneous) rate of change of f in the positive x and y directions, respectively. What about other directions? It turns out that we can find the rate of change in any direction using a more general type of derivative called a directional derivative.
Definition 2.5: directional derivative
Let f(x,y) be a real-valued function with domain D in R2, and let (a,b) be a point in D. Let v be a unit vector in R2. Then the directional derivative of f at (a,b) in the direction of v, denoted by Dvf(a,b), is defined as
Dvf(a,b)=limh→0f((a,b)+hv)−f(a,b)h
Notice in the definition that we seem to be treating the point (a,b) as a vector, since we are adding the vector hv to it. But this is just the usual idea of identifying vectors with their terminal points, which the reader should be used to by now. If we were to write the vector v as v=(v1,v2), then
Dvf(a,b)=limh→0f(a+hv1,b+hv2)−f(a,b)h
From this we can immediately recognize that the partial derivatives ∂f∂x and∂f∂y are special cases of the directional derivative with v=i=(1,0) and v=j=(0,1), respectively. That is, ∂f∂x=Dif and ∂f∂y=Djf. Since there are many vectors with the same direction, we use a unit vector in the definition, as that represents a “standard” vector for a given direction.
If f(x,y) has continuous partial derivatives ∂f∂x and ∂f∂y (which will always be the case in this text), then there is a simple formula for the directional derivative:
Theorem 2.2
Let f(x,y) be a real-valued function with domain D in R2 such that the partial derivatives ∂f∂x and ∂f∂y exist and are continuous in D. Let (a,b) be a point in D, and let v=(v1,v2) be a unit vector in R2. Then
Dvf(a,b)=v1∂f∂x(a,b)+v2∂f∂y(a,b)
Proof: Note that if v=i = (1,0) then the above formula reduces to Dvf(a,b)=∂f∂x(a,b), which we know is true since Dif=∂f∂x, as we noted earlier. Similarly, for v=j=(0,1) the formula reduces to Dvf(a,b)=∂f∂y(a,b), which is true since Djf=∂f∂y. So since i=(1,0) and j=(0,1) are the only unit vectors in R2 with a zero component, then we need only show the formula holds for unit vectors v=(v1,v2) with v1≠0 and v2≠0. So fix such a vector v and fix a number h≠0.
Then
f(a+hv1,b+hv2)−f(a,b)=f(a+hv1,b+hv2)−f(a+hv1,b)+f(a+hv1,b)−f(a,b)
Since h≠0 and v2≠0, then hv2≠0 and thus any number c between b and b+hv2 can be written as c=b+αhv2 for some number 0<α<1. So since the function f(a+hv1,y) is a realvalued function of y (since a+hv1 is a fixed number), then the Mean Value Theorem from single-variable calculus can be applied to the function g(y)=f(a+hv1,y) on the interval [b,b+hv2] (or [b+hv2,b] if one of h or v2 is negative) to find a number 0<α<1 such that
∂f∂y(a+hv1,b+αhv2)=g′(b+αhv2)=g(b+hv2)−g(b)b+hv2−b=f(a+hv1,b+hv2)−f(a+hv1,b)hv2
and so
f(a+hv1,b+hv2)−f(a+hv1,b)=hv2∂f∂y(a+hv1,b+αhv2).
By a similar argument, there exists a number 0<β<1 such that
f(a+hv1,b)−f(a,b)=hv1∂f∂x(a+βhv1,b).
Thus, by Equation ???, we have
f(a+hv1,b+hv2)−f(a,b)h=hv2∂f∂y(a+hv1,b+αhv2)+hv1∂f∂x(a+βhv1,b)h=v2∂f∂y(a+hv1,b+αhv2)+v1∂f∂x(a+βhv1,b)
so by Equation ??? we have
Dvf(a,b)=limh→0f(a+hv1,b+hv2)−f(a,b)h=limh→0[v2∂f∂y(a+hv1,b+αhv2)+v1∂f∂x(a+βhv1,b)]=v2∂f∂y(a,b)+v1∂f∂x(a,b) by the continuity of ∂f∂x and ∂f∂y, soDvf(a,b)=v1∂f∂x(a,b)+v2∂f∂y(a,b)
after reversing the order of summation.
Note that Dvf(a,b)=v⋅(∂f∂x(a,b),∂f∂y(a,b)). The second vector has a special name:
Definition 2.6
For a real-valued function f(x,y), the gradient of f, denoted by ∇f, is the vector
∇f=(∂f∂x,∂f∂y)
in R2. For a real-valued function f(x,y,z), the gradient is the vector
∇f=(∂f∂x,∂f∂y,∂f∂z)
in R3. The symbol ∇ is pronounced “del”.
Corollary 2.3
Dvf=v⋅∇f
Example 2.15
Find the directional derivative of f(x,y)=xy2+x3y at the point (1,2) in the direction of v=(1√2,1√2).
Solution
We see that ∇f=(y2+3x2y,2xy+x3), so
Dvf(1,2)=v⋅∇f(1,2)=(1√2,1√2)⋅(22+3(1)2(2),2(1)(2)+13)=15√2
A real-valued function z=f(x,y) whose partial derivatives ∂f∂x and ∂f∂y exist and are continuous is called continuously differentiable. Assume that f(x,y) is such a function and that ∇f≠0. Let c be a real number in the range of f and let v be a unit vector in R2 which is tangent to the level curve f(x,y)=c (see Figure 2.4.1).

The value of f(x,y) is constant along a level curve, so since v is a tangent vector to this curve, then the rate of change of f in the direction of v is 0, i.e. Dvf=0. But we know that Dvf=v⋅∇f=‖, where \theta is the angle between \textbf{v} \text{ and }\nabla f. So since \left\lVert \textbf{v} \right\rVert = 1 \text{ then }D_v f = \left\lVert \nabla f \right\rVert \cos \theta . So since \nabla f \neq \textbf{0}\text{ then }D_v f = 0 ⇒ \cos \theta = 0 ⇒ \theta = 90^\circ. In other words, \nabla f \perp \textbf{v}, which means that \nabla f is normal to the level curve.
In general, for any unit vector \textbf{v} in \mathbb{R}^2, we still have D_v f = \left\lVert \nabla f \right\rVert \cos \theta \text{, where }\theta is the angle between \textbf{v} \text{ and }\nabla f. At a fixed point (x, y) the length \left\lVert \nabla f \right\rVert is fixed, and the value of D_v f then varies as \theta varies. The largest value that D_v f can take is when \cos \theta = 1 (\theta = 0^\circ ), while the smallest value occurs when \cos \theta = −1 (\theta = 180^\circ ). In other words, the value of the function f increases the fastest in the direction of \nabla f (since \theta = 0^\circ in that case), and the value of f decreases the fastest in the direction of −\nabla f (since \theta = 180^\circ in that case). We have thus proved the following theorem:
Theorem 2.4
Let f (x, y) be a continuously differentiable real-valued function, with \nabla f \neq 0. Then:
- The gradient \nabla f is normal to any level curve f (x, y) = c.
- The value of f (x, y) increases the fastest in the direction of \nabla f.
- The value of f (x, y) decreases the fastest in the direction of −\nabla f.
Example 2.16
In which direction does the function f (x, y) = x y^2 + x^3 y increase the fastest from the point (1,2)? In which direction does it decrease the fastest?
Solution
Since \nabla f = (y^2 + 3x^2 y,2xy + x^3 ), then \nabla f (1,2) = (10,5) \neq \textbf{0}. A unit vector in that direction is \textbf{v} = \dfrac{\nabla f}{\left\lVert \nabla f \right\rVert} = \left (\dfrac{2}{\sqrt{5}} ,\dfrac{1}{\sqrt{5}}\right ). Thus, f increases the fastest in the direction of \left (\dfrac{2}{\sqrt{5}} ,\dfrac{1}{\sqrt{5}}\right ) and decreases the fastest in the direction of \left (\dfrac{−2}{\sqrt{5}},\dfrac{−1}{\sqrt{5}}\right ).
Though we proved Theorem 2.4 for functions of two variables, a similar argument can be used to show that it also applies to functions of three or more variables. Likewise, the directional derivative in the three-dimensional case can also be defined by the formula D_v f = \textbf{v}\cdot \nabla f.
Example 2.17
The temperature T of a solid is given by the function T(x, y, z) = e^−x + e^{−2y} + e^{4z}\text{, where }x, y, z are space coordinates relative to the center of the solid. In which direction from the point (1,1,1) will the temperature decrease the fastest?
Solution
Since \nabla f = (−e^{−x} ,−2e^{−2y} ,4e^{4z} ), then the temperature will decrease the fastest in the direction of −\nabla f (1,1,1) = (e^{−1} ,2e^{−2} ,−4e^4 ).