temp
- Last updated
- Jan 15, 2020
- Save as PDF
- Page ID
- 32331
( \newcommand{\kernel}{\mathrm{null}\,}\)
[rn:chapter] [seq:chapter] [lim:chapter] [der:chapter] [int:chapter] [fs:chapter] [ms:chapter]
Typeset in LaTeX.
Copyright ©2012–2017 Jiří Lebl
This work is dual licensed under the Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License and the Creative Commons Attribution-Share Alike 4.0 International License. To view a copy of these licenses, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ or http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative Commons PO Box 1866, Mountain View, CA 94042, USA.
You can use, print, duplicate, share this book as much as you want. You can base your own notes on it and reuse parts if you keep the license the same. You can assume the license is either the CC-BY-NC-SA or CC-BY-SA, whichever is compatible with what you wish to do, your derivative works must use at least one of the licenses.
During the writing of these notes, the author was in part supported by NSF grant DMS-1362337.
The date is the main identifier of version. The major version / edition number is raised only if there have been substantial changes. For example version 1.0 is first edition, 0th update (no updates yet).
See http://www.jirka.org/ra/ for more information (including contact information).
Introduction
About this book
This book is the continuation of “Basic Analysis”. The book is meant to be a seamless continuation, so the chapters are numbered to start where the first volume left off. The book started with my notes for a second semester undergraduate analysis at University of Wisconsin—Madison in 2012, where I used my notes together with Rudin’s book. In 2016, I taught a second semester undergraduate analysis at Oklahoma State University and heavily modified and cleaned up the notes, this time using them as the main text.
I plan on eventually adding more topics especially at the end. I will try to preserve the current numbering in subsequent editions as always. The new topics I have planned would add sections and chapters onto the end of the book rather than be inserted in the middle.
For the most part, this second volume depends on the non-optional parts of volume I, however, the optional bits such as higher order derivatives are sometimes used, for example in 6, 3, 6. This book is not necessarily the entire second semester course. What I had in mind for a two semester course is that some bits of the first volume, such as metric spaces, are covered in the second semester, while some of the optional topics of volume I are covered in the first semester. Leaving metric spaces for second semester makes more sense as then the second semester is the “multivariable” part of the course.
Several possibilities for the material in this book are:
1) 1–5, (perhaps 1), 1 and 2.
2) 1–6, 1–3, 1 and 2.
3) Everything.
When I ran the course at OSU, I covered the first book minus metric spaces and a couple of optional sections in the first semester. Then, in the second semester, I covered most of what I skipped from volume I, including metric spaces, and took option 2) above.
Several variables and partial derivatives
Vector spaces, linear mappings, and convexity
Note: 2–3 lectures
Vector spaces
The euclidean space Rn has already made an appearance in the metric space chapter. In this chapter, we will extend the differential calculus we created for one variable to several variables. The key idea in differential calculus is to approximate functions by lines and linear functions. In several variables we must introduce a little bit of linear algebra before we can move on. So let us start with vector spaces and linear functions on vector spaces.
While it is common to use →x or the bold x for elements of Rn, especially in the applied sciences, we use just plain x, which is common in mathematics. That is, v∈Rn is a vector, which means v=(v1,v2,…,vn) is an n-tuple of real numbers.1
It is common to write and treat vectors as column vectors, that is, n×1 matrices: v=(v1,v2,…,vn)= \scriptsize [v1v2⋮vn] . We will do so when convenient. We call real numbers scalars to distinguish them from vectors.
The set Rn has a so-called vector space structure defined on it. However, even though we will be looking at functions defined on Rn, not all spaces we wish to deal with are equal to Rn. Therefore, let us define the abstract notion of the vector space.
Let X be a set together with operations of addition, +:X×X→X, and multiplication, ⋅:R×X→X, (we usually write ax instead of a⋅x). X is called a vector space (or a real vector space) if the following conditions are satisfied:
- (Addition is associative) If u,v,w∈X, then u+(v+w)=(u+v)+w.
- (Addition is commutative) If u,v∈X, then u+v=v+u.
- (Additive identity) There is a 0∈X such that v+0=v for all v∈X.
- (Additive inverse) For every v∈X, there is a −v∈X, such that v+(−v)=0.
- (Distributive law) If a∈R, u,v∈X, then a(u+v)=au+av.
- (Distributive law) If a,b∈R, v∈X, then (a+b)v=av+bv.
- (Multiplication is associative) If a,b∈R, v∈X, then (ab)v=a(bv).
- (Multiplicative identity) 1v=v for all v∈X.
Elements of a vector space are usually called vectors, even if they are not elements of Rn (vectors in the “traditional” sense).
If Y⊂X is a subset that is a vector space itself with the same operations, then Y is called a subspace or vector subspace of X.
An example vector space is Rn, where addition and multiplication by a scalar is done componentwise: if a∈R, v=(v1,v2,…,vn)∈Rn, and w=(w1,w2,…,wn)∈Rn, then v+w:=(v1,v2,…,vn)+(w1,w2,…,wn)=(v1+w1,v2+w2,…,vn+wn),av:=a(v1,v2,…,vn)=(av1,av2,…,avn).
In this book we mostly deal with vector spaces that can be often regarded as subsets of Rn, but there are other vector spaces useful in analysis. Let us give a couple of examples.
A trivial example of a vector space (the smallest one in fact) is just X={0}. The operations are defined in the obvious way. You always need a zero vector to exist, so all vector spaces are nonempty sets.
The space C([0,1],R) of continuous functions on the interval [0,1] is a vector space. For two functions f and g in C([0,1],R) and a∈R, we make the obvious definitions of f+g and af: (f+g)(x):=f(x)+g(x),(af)(x):=a(f(x)). The 0 is the function that is identically zero. We leave it as an exercise to check that all the vector space conditions are satisfied.
The space of polynomials c0+c1t+c2t2+⋯+cmtm is a vector space, let us denote it by R[t] (coefficients are real and the variable is t). The operations are defined in the same way as for functions above. Suppose there are two polynomials, one of degree m and one of degree n. Assume n≥m for simplicity. Then (c0+c1t+c2t2+⋯+cmtm)+(d0+d1t+d2t2+⋯+dntn)=(c0+d0)+(c1+d1)t+(c2+d2)t2+⋯+(cm+dm)tm+dm+1tm+1+⋯+dntn and a(c0+c1t+c2t2+⋯+cmtm)=(ac0)+(ac1)t+(ac2)t2+⋯+(acm)tm. Despite what it looks like, R[t] is not equivalent to Rn for any n. In particular, it is not “finite dimensional”, we will make this notion precise in just a little bit. One can make a finite dimensional vector subspace by restricting the degree. For example, if we say Pn is the set of polynomials of degree n or less, then Pn is a finite dimensional vector space.
The space R[t] can be thought of as a subspace of C(R,R). If we restrict the range of t to [0,1], R[t] can be identified with a subspace of C([0,1],R).
It is often better to think of even simpler “finite dimensional” vector spaces using the abstract notion rather than always Rn. It is possible to use other fields than R in the definition (for example it is common to use the complex numbers C), but let us stick with the real numbers2.
Linear combinations and dimension
Suppose X is a vector space, x1,x2,…,xk∈X are vectors, and a1,a2,…,ak∈R are scalars. Then a1x1+a2x2+⋯+akxk is called a linear combination of the vectors x1,x2,…,xk.
If Y⊂X is a set, then the span of Y, or in notation span(Y), is the set of all linear combinations of all finite subsets of Y. We also say Y spans span(Y).
Let Y:={(1,1)}⊂R2. Then span(Y)={(x,x)∈R2:x∈R}. That is, span(Y) is the line through the origin and the point (1,1).
[example:vecspr2span] Let Y:={(1,1),(0,1)}⊂R2. Then span(Y)=R2, as any point (x,y)∈R2 can be written as a linear combination (x,y)=x(1,1)+(y−x)(0,1).
A sum of two linear combinations is again a linear combination, and a scalar multiple of a linear combination is a linear combination, which proves the following proposition.
Let X be a vector space. For any Y⊂X, the set span(Y) is a vector space itself. That is, span(Y) is a subspace of X.
If Y is already a vector space, then span(Y)=Y.
A set of vectors {x1,x2,…,xk}⊂X is linearly independent, if the only solution to a1x1+a2x2+⋯+akxk=0 is the trivial solution a1=a2=⋯=ak=0. A set that is not linearly independent, is linearly dependent.
A linearly independent set B of vectors such that span(B)=X is called a basis of X. For example the set Y of the two vectors in is a basis of R2.
If a vector space X contains a linearly independent set of d vectors, but no linearly independent set of d+1 vectors, then we say the dimension or dimX:=d. If for all d∈N the vector space X contains a set of d linearly independent vectors, we say X is infinite dimensional and write dimX:=∞.
Clearly for the trivial vector space, dim{0}=0. We will see in a moment that any vector subspace of Rn has a finite dimension, and that dimension is less than or equal to n.
If a set is linearly dependent, then one of the vectors is a linear combination of the others. In other words, in [eq:lincomb] if aj≠0, then we solve for xj xj=a1ajx1+⋯+aj−1ajxj−1+aj+1ajxj+1+⋯+akakxk. The vector xj has at least two different representations as linear combinations of {x1,x2,…,xk}. The one above and xj itself.
If B={x1,x2,…,xk} is a basis of a vector space X, then every point y∈X has a unique representation of the form y=k∑j=1ajxj for some scalars a1,a2,…,ak.
Every y∈X is a linear combination of elements of B since X is the span of B. For uniqueness suppose y=k∑j=1ajxj=k∑j=1bjxj, then k∑j=1(aj−bj)xj=0. By linear independence of the basis aj=bj for all j.
For Rn we define e1:=(1,0,0,…,0),e2:=(0,1,0,…,0),…,en:=(0,0,0,…,1), and call this the standard basis of Rn. We use the same letters ej for any Rn, and which space Rn we are working in is understood from context. A direct computation shows that {e1,e2,…,en} is really a basis of Rn; it spans Rn and is linearly independent. In fact, x=(x1,x2,…,xn)=n∑j=1xjej.
[mv:dimprop] Let X be a vector space and d a nonnegative integer.
- [mv:dimprop:i] If X is spanned by d vectors, then dimX≤d.
- [mv:dimprop:ii] dimX=d if and only if X has a basis of d vectors (and so every basis has d vectors).
- [mv:dimprop:iii] In particular, dimRn=n.
- [mv:dimprop:iv] If Y⊂X is a vector subspace and dimX=d, then dimY≤d.
- [mv:dimprop:v] If dimX=d and a set T of d vectors spans X, then T is linearly independent.
- [mv:dimprop:vi] If dimX=d and a set T of m vectors is linearly independent, then there is a set S of d−m vectors such that T∪S is a basis of X.
Let us start with [mv:dimprop:i]. Suppose S={x1,x2,…,xd} spans X, and T={y1,y2,…,ym} is a set of linearly independent vectors of X. We wish to show that m≤d. Write y1=d∑k=1ak,1xk, for some numbers a1,1,a2,1,…,ad,1, which we can do as S spans X. One of the ak,1 is nonzero (otherwise y1 would be zero), so suppose without loss of generality that this is a1,1. Then we solve x1=1a1,1y1−d∑k=2ak,1a1,1xk. In particular, {y1,x2,…,xd} span X, since x1 can be obtained from {y1,x2,…,xd}. Therefore, there are some numbers for some numbers a1,2,a2,2,…,ad,2, such that y2=a1,2y1+d∑k=2ak,2xk. As T is linearly independent, one of the ak,2 for k≥2 must be nonzero. Without loss of generality suppose a2,2≠0. Proceed to solve for x2=1a2,2y2−a1,2a2,2y1−d∑k=3ak,2a2,2xk. In particular, {y1,y2,x3,…,xd} spans X.
We continue this procedure. If m<d, then we are done. So suppose m≥d. After d steps we obtain that {y1,y2,…,yd} spans X. Any other vector v in X is a linear combination of {y1,y2,…,yd}, and hence cannot be in T as T is linearly independent. So m=d.
Let us look at [mv:dimprop:ii]. First, if T is a set of k linearly independent vectors that do not span X, that is X∖span(T)≠∅, then choose a vector v∈X∖span(T). The set T∪{v} is linearly independent (exercise). If dimX=d, then there must exist some linearly independent set of d vectors T, and it must span X, otherwise we could choose a larger set of linearly independent vectors. So we have a basis of d vectors. On the other hand if we have a basis of d vectors, it is linearly independent and spans X by definition. By [mv:dimprop:i] we know there is no set of d+1 linearly independent vectors, so dimension must be d.
For [mv:dimprop:iii] notice that {e1,e2,…,en} is a basis of Rn.
To see [mv:dimprop:iv], suppose Y is a vector space and Y⊂X, where dimX=d. As X cannot contain d+1 linearly independent vectors, neither can Y.
For [mv:dimprop:v] suppose T is a set of m vectors that is linearly dependent and spans X. Then one of the vectors is a linear combination of the others. Therefore if we remove it from T we obtain a set of m−1 vectors that still span X and hence dimX≤m−1 by [mv:dimprop:i].
For [mv:dimprop:vi] suppose T={x1,x2,…,xm} is a linearly independent set. We follow the procedure above in the proof of [mv:dimprop:ii] to keep adding vectors while keeping the set linearly independent. As the dimension is d we can add a vector exactly d−m times.
Linear mappings
A function f:X→Y, when Y is not R, is often called a mapping or a map rather than a function.
A mapping A:X→Y of vector spaces X and Y is linear (or a linear transformation) if for every a∈R and every x,y∈X, A(ax)=aA(x),andA(x+y)=A(x)+A(y). We usually write Ax instead of A(x) if A is linear. If A is one-to-one and onto, then we say A is invertible, and we denote the inverse by A−1. If A:X→X is linear, then we say A is a linear operator on X.
We write L(X,Y) for the set of all linear transformations from X to Y, and just L(X) for the set of linear operators on X. If a∈R and A,B∈L(X,Y), define the transformations aA and A+B by (aA)(x):=aAx,(A+B)(x):=Ax+Bx.
If A∈L(Y,Z) and B∈L(X,Y), define the transformation AB as the composition A∘B, that is, ABx:=A(Bx).
Finally denote by I∈L(X) the identity: the linear operator such that Ix=x for all x.
It is not hard to see that aA∈L(X,Y) and A+B∈L(X,Y), and that AB∈L(X,Z). In particular, L(X,Y) is a vector space. As the set L(X) is not only a vector space, but also admits a product, it is often called an algebra.
An immediate consequence of the definition of a linear mapping is: if A is linear, then A0=0.
If A∈L(X,Y) is invertible, then A−1 is linear.
Let a∈R and y∈Y. As A is onto, then there is an x such that y=Ax, and further as it is also one-to-one A−1(Az)=z for all z∈X. So A−1(ay)=A−1(aAx)=A−1(A(ax))=ax=aA−1(y). Similarly let y1,y2∈Y, and x1,x2∈X such that Ax1=y1 and Ax2=y2, then A−1(y1+y2)=A−1(Ax1+Ax2)=A−1(A(x1+x2))=x1+x2=A−1(y1)+A−1(y2).\qedhere
[mv:lindefonbasis] If A∈L(X,Y) is linear, then it is completely determined by its values on a basis of X. Furthermore, if B is a basis of X, then any function ˜A:B→Y extends to a linear function on X.
We will only prove this proposition for finite dimensional spaces, as we do not need infinite dimensional spaces. For infinite dimensional spaces, the proof is essentially the same, but a little trickier to write, so let us stick with finitely many dimensions.
Let {x1,x2,…,xn} be a basis and suppose Axj=yj. Every x∈X has a unique representation x=n∑j=1bjxj for some numbers b1,b2,…,bn. By linearity Ax=An∑j=1bjxj=n∑j=1bjAxj=n∑j=1bjyj. The “furthermore” follows by setting yj:=˜A(xj), and defining the extension as Ax:=∑nj=1bjyj. The function is well defined by uniqueness of the representation of x. We leave it to the reader to check that A is linear.
The next proposition only works for finite dimensional vector spaces. It is a special case of the so-called rank-nullity theorem from linear algebra.
[mv:prop:lin11onto] If X is a finite dimensional vector space and A∈L(X), then A is one-to-one if and only if it is onto.
Let {x1,x2,…,xn} be a basis for X. Suppose A is one-to-one. Now suppose n∑j=1cjAxj=An∑j=1cjxj=0. As A is one-to-one, the only vector that is taken to 0 is 0 itself. Hence, 0=n∑j=1cjxj and cj=0 for all j. So {Ax1,Ax2,…,Axn} is a linearly independent set. By and the fact that the dimension is n, we conclude {Ax1,Ax2,…,Axn} span X. Any point x∈X can be written as x=n∑j=1ajAxj=An∑j=1ajxj, so A is onto.
Now suppose A is onto. As A is determined by the action on the basis we see that every element of X has to be in the span of {Ax1,Ax2,…,Axn}. Suppose An∑j=1cjxj=n∑j=1cjAxj=0. By as {Ax1,Ax2,…,Axn} span X, the set is independent, and hence cj=0 for all j. In other words if Ax=0, then x=0. This means that A is one-to-one: If Ax=Ay, then A(x−y)=0 and so x=y.
We leave the proof of the next proposition as an exercise.
[prop:LXYfinitedim] If X and Y are finite dimensional vector spaces, then L(X,Y) is also finite dimensional.
Finally let us note that we often identify a finite dimensional vector space X of dimension n with Rn, provided we fix a basis {x1,x2,…,xn} in X. That is, we define a bijective linear map A∈L(X,Rn) by Axj=ej, where {e1,e2,…,en}. Then we have the correspondence n∑j=1cjxj∈XA↦(c1,c2,…,cn)∈Rn.
Convexity
A subset U of a vector space is convex if whenever x,y∈U, the line segment from x to y lies in U. That is, if the convex combination (1−t)x+ty is in U for all t∈[0,1]. See .
Note that in R, every connected interval is convex. In R2 (or higher dimensions) there are lots of nonconvex connected sets. For example the set R2∖{0} is not convex but it is connected. To see this simply take any x∈R2∖{0} and let y:=−x. Then (\nicefrac12)x+(\nicefrac12)y=0, which is not in the set. On the other hand, the ball B(x,r)⊂Rn (using the standard metric on Rn) is convex by the triangle inequality.
Show that in Rn any ball B(x,r) for x∈Rn and r>0 is convex.
Any subspace V of a vector space X is convex.
A somewhat more complicated example is given by the following. Let C([0,1],R) be the vector space of continuous real valued functions on R. Let X⊂C([0,1],R) be the set of those f such that ∫10f(x) dx≤1andf(x)≥0 for all x∈[0,1]. Then X is convex. Take t∈[0,1], and note that if f,g∈X, then tf(x)+(1−t)g(x)≥0 for all x. Furthermore ∫10(tf(x)+(1−t)g(x)) dx=t∫10f(x) dx+(1−t)∫10g(x) dx≤1. Note that X is not a subspace of C([0,1],R).
The intersection two convex sets is convex. In fact, if {Cλ}λ∈I is an arbitrary collection of convex sets, then C:=⋂λ∈ICλ is convex.
If x,y∈C, then x,y∈Cλ for all λ∈I, and hence if t∈[0,1], then tx+(1−t)y∈Cλ for all λ∈I. Therefore tx+(1−t)y∈C and C is convex.
Let T:V→W be a linear mapping between two vector spaces and let C⊂V be a convex set. Then T(C) is convex.
Take any two points p,q∈T(C). Pick x,y∈C such that Tx=p and Ty=q. As C is convex, then tx+(1−t)y∈C for all t∈[0,1], so tp+(1−t)q=tTx+(1−t)Ty=T(tx+(1−t)y)∈T(C).\qedhere
For completeness, a very useful construction is the convex hull. Given any set S⊂V of a vector space, define the convex hull of S, by co(S):=⋂{C⊂V:S⊂C, and C is convex}. That is, the convex hull is the smallest convex set containing S. By a proposition above, the intersection of convex sets is convex and hence, the convex hull is convex.
The convex hull of 0 and 1 in R is [0,1]. Proof: Any convex set containing 0 and 1 must contain [0,1]. The set [0,1] is convex, therefore it must be the convex hull.
Exercises
Verify that Rn is a vector space.
Let X be a vector space. Prove that a finite set of vectors {x1,…,xn}⊂X is linearly independent if and only if for every j=1,2,…,n span({x1,…,xj−1,xj+1,…,xn})⊊ That is, the span of the set with one vector removed is strictly smaller.
Show that the set X \subset C([0,1],{\mathbb{R}}) of those functions such that \int_0^1 f = 0 is a vector subspace.
Prove C([0,1],{\mathbb{R}}) is an infinite dimensional vector space where the operations are defined in the obvious way: s=f+g and m=fg are defined as s(x) := f(x)+g(x) and m(x) := f(x)g(x). Hint: for the dimension, think of functions that are only nonzero on the interval (\nicefrac{1}{n+1},\nicefrac{1}{n}).
Let k \colon [0,1]^2 \to {\mathbb{R}} be continuous. Show that L \colon C([0,1],{\mathbb{R}}) \to C([0,1],{\mathbb{R}}) defined by Lf(y) := \int_0^1 k(x,y)f(x)~dx is a linear operator. That is, show that L is well defined (that Lf is continuous), and that L is linear.
Let {\mathcal{P}}_n be the vector space of polynomials in one variable of degree n or less. Show that {\mathcal{P}}_n is a vector space of dimension n+1.
Let {\mathbb{R}}[t] be the vector space of polynomials in one variable t. Let D \colon {\mathbb{R}}[t] \to {\mathbb{R}}[t] be the derivative operator (derivative in t). Show that D is a linear operator.
Let us show that only works in finite dimensions. Take {\mathbb{R}}[t] and define the operator A \colon {\mathbb{R}}[t] \to {\mathbb{R}}[t] by A\bigl(P(t)\bigr) = tP(t). Show that A is linear and one-to-one, but show that it is not onto.
Finish the proof of in the finite dimensional case. That is, suppose, \{ x_1, x_2,\ldots x_n \} is a basis of X, \{ y_1, y_2,\ldots y_n \} \subset Y and we define a function Ax := \sum_{j=1}^n b_j y_j, \qquad \text{if} \quad x=\sum_{j=1}^n b_j x_j . Then prove that A \colon X \to Y is linear.
Prove . Hint: A linear operator is determined by its action on a basis. So given two bases \{ x_1,\ldots,x_n \} and \{ y_1,\ldots,y_m \} for X and Y respectively, consider the linear operators A_{jk} that send A_{jk} x_j = y_k, and A_{jk} x_\ell = 0 if \ell \not= j.
Suppose X and Y are vector spaces and A \in L(X,Y) is a linear operator.
a) Show that the nullspace N := \{ x \in X : Ax = 0 \} is a vectorspace.
b) Show that the range R := \{ y \in Y : Ax = y \text{ for some\)x X\(} \} is a vectorspace.
Show by example that a union of convex sets need not be convex.
Compute the convex hull of the set of 3 points \{ (0,0), (0,1), (1,1) \} in {\mathbb{R}}^2.
Show that the set \{ (x,y) \in {\mathbb{R}}^2 : y > x^2 \} is a convex set.
Show that the set X \subset C([0,1],{\mathbb{R}}) of those functions such that \int_0^1 f = 1 is a convex set, but not a vector subspace.
Show that every convex set in {\mathbb{R}}^n is connected using the standard topology on {\mathbb{R}}^n.
Suppose K \subset {\mathbb{R}}^2 is a convex set such that the only point of the form (x,0) in K is the point (0,0). Further suppose that there (0,1) \in K and (1,1) \in K. Then show that if (x,y) \in K, then y > 0 unless x=0.
Analysis with vector spaces
Note: 2-3 lectures
Norms
Let us start measuring distance.
If X is a vector space, then we say a function \lVert {\cdot} \rVert \colon X \to {\mathbb{R}} is a norm if:
- [defn:norm:i] \lVert {x} \rVert \geq 0, with \lVert {x} \rVert=0 if and only if x=0.
- [defn:norm:ii] \lVert {cx} \rVert = \left\lvert {c} \right\rvert\lVert {x} \rVert for all c \in {\mathbb{R}} and x \in X.
- [defn:norm:iii] \lVert {x+y} \rVert \leq \lVert {x} \rVert+\lVert {y} \rVert for all x,y \in X (Triangle inequality).
Before defining the standard norm on {\mathbb{R}}^n, let us define the standard scalar dot product on {\mathbb{R}}^n. For two vectors if x=(x_1,x_2,\ldots,x_n) \in {\mathbb{R}}^n and y=(y_1,y_2,\ldots,y_n) \in {\mathbb{R}}^n, define x \cdot y := \sum_{j=1}^n x_j y_j . It is easy to see that the dot product is linear in each variable separately, that is, it is a linear mapping when you keep one of the variables constant. The Euclidean norm is defined as \lVert {x} \rVert := \lVert {x} \rVert_{{\mathbb{R}}^n} := \sqrt{x \cdot x} = \sqrt{(x_1)^2+(x_2)^2 + \cdots + (x_n)^2}. We normally just use \lVert {x} \rVert, but sometimes it will be necessary to emphasize that we are talking about the euclidean norm and use \lVert {x} \rVert_{{\mathbb{R}}^n}. It is easy to see that the Euclidean norm satisfies [defn:norm:i] and [defn:norm:ii]. To prove that [defn:norm:iii] holds, the key inequality is the so-called Cauchy-Schwarz inequality we saw before. As this inequality is so important let us restate and reprove it using the notation of this chapter.
Let x, y \in {\mathbb{R}}^n, then \left\lvert {x \cdot y} \right\rvert \leq \lVert {x} \rVert\lVert {y} \rVert = \sqrt{x\cdot x}\, \sqrt{y\cdot y}, with equality if and only if the vectors are scalar multiples of each other.
If x=0 or y = 0, then the theorem holds trivially. So assume x\not= 0 and y \not= 0.
If x is a scalar multiple of y, that is x = \lambda y for some \lambda \in {\mathbb{R}}, then the theorem holds with equality: \left\lvert {\lambda y \cdot y} \right\rvert = \left\lvert {\lambda} \right\rvert \, \left\lvert {y\cdot y} \right\rvert = \left\lvert {\lambda} \right\rvert \, \lVert {y} \rVert^2 = \lVert {\lambda y} \rVert \lVert {y} \rVert .
Next take x+ty, we find that \lVert {x+ty} \rVert^2 is a quadratic polynomial in t: \lVert {x+ty} \rVert^2 = (x+ty) \cdot (x+ty) = x \cdot x + x \cdot ty + ty \cdot x + ty \cdot ty = \lVert {x} \rVert^2 + 2t(x \cdot y) + t^2 \lVert {y} \rVert^2 . If x is not a scalar multiple of y, then \lVert {x+ty} \rVert^2 > 0 for all t. So the polynomial \lVert {x+ty} \rVert^2 is never zero. Elementary algebra says that the discriminant must be negative: 4 {(x \cdot y)}^2 - 4 \lVert {x} \rVert^2\lVert {y} \rVert^2 < 0, or in other words {(x \cdot y)}^2 < \lVert {x} \rVert^2\lVert {y} \rVert^2.
Item [defn:norm:iii], the triangle inequality, follows via a simple computation: \lVert {x+y} \rVert^2 = x \cdot x + y \cdot y + 2 (x \cdot y) \leq \lVert {x} \rVert^2 + \lVert {y} \rVert^2 + 2 (\lVert {x} \rVert\lVert {y} \rVert) = {(\lVert {x} \rVert + \lVert {y} \rVert)}^2 .
The distance d(x,y) := \lVert {x-y} \rVert is the standard distance function on {\mathbb{R}}^n that we used when we talked about metric spaces.
In fact, on any vector space X, once we have a norm (any norm), we define a distance d(x,y) := \lVert {x-y} \rVert that makes X into a metric space (an easy exercise).
Let A \in L(X,Y). Define \lVert {A} \rVert := \sup \{ \lVert {Ax} \rVert : x \in X ~ \text{with} ~ \lVert {x} \rVert = 1 \} . The number \lVert {A} \rVert is called the operator norm. We will see below that indeed it is a norm (at least for finite dimensional spaces). Again, when necessary to emphasize which norm we are talking about, we may write it as \lVert {A} \rVert_{L(X,Y)}.
By linearity, \left\lVert {A \frac{x}{\lVert {x} \rVert}} \right\rVert = \frac{\lVert {Ax} \rVert}{\lVert {x} \rVert}, for any nonzero x \in X. The vector \frac{x}{\lVert {x} \rVert} is of norm 1. Therefore, \lVert {A} \rVert = \sup \{ \lVert {Ax} \rVert : x \in X ~ \text{with} ~ \lVert {x} \rVert = 1 \} = \sup_{\substack{x \in X\\x\neq 0}} \frac{\lVert {Ax} \rVert}{\lVert {x} \rVert} . This implies that \lVert {Ax} \rVert \leq \lVert {A} \rVert \lVert {x} \rVert .
It is not hard to see from the definition that \lVert {A} \rVert = 0 if and only if A = 0, that is, if A takes every vector to the zero vector.
It is also not difficult to see the norm of the identity operator: \lVert {I} \rVert = \sup_{\substack{x \in X\\x\neq 0}} \frac{\lVert {Ix} \rVert}{\lVert {x} \rVert} = \sup_{\substack{x \in X\\x\neq 0}} \frac{\lVert {x} \rVert}{\lVert {x} \rVert} = 1.
For finite dimensional spaces, \lVert {A} \rVert is always finite as we prove below. This also implies that A is continuous. For infinite dimensional spaces neither statement needs to be true. For a simple example, take the vector space of continuously differentiable functions on [0,1] and as the norm use the uniform norm. The functions \sin(nx) have norm 1, but the derivatives have norm n. So differentiation (which is a linear operator) has unbounded norm on this space. But let us stick to finite dimensional spaces now.
When we talk about finite dimensional vector space, one often thinks of {\mathbb{R}}^n, although if we have a norm, the norm might perhaps not be the standard euclidean norm. In the exercises, you can prove that every norm is “equivalent” to the euclidean norm in that the topology it generates is the same. For simplicity, we only prove the following proposition for the euclidean space, and the proof for a general finite dimensional space is left as an exercise.
[prop:finitedimpropnormfin] Let X and Y be finite dimensional vector spaces with a norm. If A \in L(X,Y), then \lVert {A} \rVert < \infty, and A is uniformly continuous (Lipschitz with constant \lVert {A} \rVert).
As we said we only prove the proposition for euclidean space so suppose that X = {\mathbb{R}}^n and Y={\mathbb{R}}^m and the norm is the standard euclidean norm. The general case is left as an exercise.
Let \{ e_1,e_2,\ldots,e_n \} be the standard basis of {\mathbb{R}}^n. Write x \in {\mathbb{R}}^n, with \lVert {x} \rVert = 1, as x = \sum_{j=1}^n c_j e_j . Since e_j \cdot e_\ell = 0 whenever j\not=\ell and e_j \cdot e_j = 1, then c_j = x \cdot e_j and \left\lvert {c_j} \right\rvert = \left\lvert { x \cdot e_j } \right\rvert \leq \lVert {x} \rVert \lVert {e_j} \rVert = 1 . Then \lVert {Ax} \rVert = \left\lVert {\sum_{j=1}^n c_j Ae_j} \right\rVert \leq \sum_{j=1}^n \left\lvert {c_j} \right\rvert \lVert {Ae_j} \rVert \leq \sum_{j=1}^n \lVert {Ae_j} \rVert . The right hand side does not depend on x. We found a finite upper bound independent of x, so \lVert {A} \rVert < \infty.
Now for any vector spaces X and Y, and A \in L(X,Y), suppose that \lVert {A} \rVert < \infty. For v,w \in X, \lVert {A(v-w)} \rVert \leq \lVert {A} \rVert \lVert {v-w} \rVert . As \lVert {A} \rVert < \infty, then this says A is Lipschitz with constant \lVert {A} \rVert.
[prop:finitedimpropnorm] Let X, Y, and Z be finite dimensional vector spaces with a norm.
- [item:finitedimpropnorm:i] If A,B \in L(X,Y) and c \in {\mathbb{R}}, then \lVert {A+B} \rVert \leq \lVert {A} \rVert+\lVert {B} \rVert, \qquad \lVert {cA} \rVert = \left\lvert {c} \right\rvert\lVert {A} \rVert . In particular, the operator norm is a norm on the vector space L(X,Y).
- [item:finitedimpropnorm:ii] If A \in L(X,Y) and B \in L(Y,Z), then \lVert {BA} \rVert \leq \lVert {B} \rVert \lVert {A} \rVert .
For [item:finitedimpropnorm:i], \lVert {(A+B)x} \rVert = \lVert {Ax+Bx} \rVert \leq \lVert {Ax} \rVert+\lVert {Bx} \rVert \leq \lVert {A} \rVert \lVert {x} \rVert+\lVert {B} \rVert\lVert {x} \rVert = (\lVert {A} \rVert+\lVert {B} \rVert) \lVert {x} \rVert . So \lVert {A+B} \rVert \leq \lVert {A} \rVert+\lVert {B} \rVert.
Similarly, \lVert {(cA)x} \rVert = \left\lvert {c} \right\rvert \lVert {Ax} \rVert \leq (\left\lvert {c} \right\rvert\lVert {A} \rVert) \lVert {x} \rVert . Thus \lVert {cA} \rVert \leq \left\lvert {c} \right\rvert\lVert {A} \rVert. Next, \left\lvert {c} \right\rvert \lVert {Ax} \rVert = \lVert {cAx} \rVert \leq \lVert {cA} \rVert \lVert {x} \rVert . Hence \left\lvert {c} \right\rvert\lVert {A} \rVert \leq \lVert {cA} \rVert.
For [item:finitedimpropnorm:ii] write \lVert {BAx} \rVert \leq \lVert {B} \rVert \lVert {Ax} \rVert \leq \lVert {B} \rVert \lVert {A} \rVert \lVert {x} \rVert . \qedhere
As a norm defines a metric, there is a metric space topology on L(X,Y), so we can talk about open/closed sets, continuity, and convergence.
[prop:finitedimpropinv] Let X be a finite dimensional vector space with a norm. Let U \subset L(X) be the set of invertible linear operators.
[finitedimpropinv:i] If A \in U and B \in L(X), and \label{eqcontineq} \lVert {A-B} \rVert < \frac{1}{\lVert {A^{-1}} \rVert}, then B is invertible.
- [finitedimpropinv:ii] U is open and A \mapsto A^{-1} is a continuous function on U.
Let us make sense of this on a simple example. Think back to {\mathbb{R}}^1, where linear operators are just numbers a and the operator norm of a is simply \left\lvert {a} \right\rvert. The operator a is invertible (a^{-1} = \nicefrac{1}{a}) whenever a \not=0. The condition \left\lvert {a-b} \right\rvert < \frac{1}{\left\lvert {a^{-1}} \right\rvert} does indeed imply that b is not zero. And a \mapsto \nicefrac{1}{a} is a continuous map. When n > 1, then there are other noninvertible operators than just zero, and in general things are a bit more difficult.
Let us prove [finitedimpropinv:i]. We know something about A^{-1} and something about A-B. These are linear operators so let us apply them to a vector. A^{-1}(A-B)x = x-A^{-1}Bx . Therefore, \begin{split} \lVert {x} \rVert & = \lVert {A^{-1} (A-B)x + A^{-1}Bx} \rVert \\ & \leq \lVert {A^{-1}} \rVert\lVert {A-B} \rVert \lVert {x} \rVert + \lVert {A^{-1}} \rVert\lVert {Bx} \rVert . \end{split} Now assume x \neq 0 and so \lVert {x} \rVert \neq 0. Using [eqcontineq] we obtain \lVert {x} \rVert < \lVert {x} \rVert + \lVert {A^{-1}} \rVert\lVert {Bx} \rVert , or in other words \lVert {Bx} \rVert \not= 0 for all nonzero x, and hence Bx \not= 0 for all nonzero x. This is enough to see that B is one-to-one (if Bx = By, then B(x-y) = 0, so x=y). As B is one-to-one operator from X to X which is finite dimensional and hence is invertible.
Let us look at [finitedimpropinv:ii]. Fix some A \in U. Let B be invertible and near A, that is \lVert {A-B} \rVert \lVert {A^{-1}} \rVert < \nicefrac{1}{2}. Then [eqcontineq] is satisfied. We have shown above (using B^{-1}y instead of x) \lVert {B^{-1}y} \rVert \leq \lVert {A^{-1}} \rVert\lVert {A-B} \rVert \lVert {B^{-1}y} \rVert + \lVert {A^{-1}} \rVert\lVert {y} \rVert \leq \nicefrac{1}{2} \lVert {B^{-1}y} \rVert + \lVert {A^{-1}} \rVert\lVert {y} \rVert , or \[\lVert {B^{-1}y} \rVert \leq %\frac{1}{1- \snorm{A^{-1}}\snorm{A-B}) \snorm{A^{-1}}\snorm{y} . 2\lVert {A^{-1}} \rVert\lVert {y} \rVert .\] So \lVert {B^{-1}} \rVert \leq 2 \lVert {A^{-1}} \rVert %\frac{\snorm{A^{-1}}}{1- \snorm{A^{-1}}\snorm{A-B})} ..
Now A^{-1}(A-B)B^{-1} = A^{-1}(AB^{-1}-I) = B^{-1}-A^{-1} , and \lVert {B^{-1}-A^{-1}} \rVert = \lVert {A^{-1}(A-B)B^{-1}} \rVert \leq \lVert {A^{-1}} \rVert\lVert {A-B} \rVert\lVert {B^{-1}} \rVert \leq %\frac{\snorm{A^{-1}}^2}{1- \snorm{A^{-1}}\snorm{A-B})} %\snorm{A-B} %\leq 2\lVert {A^{-1}} \rVert^2 \lVert {A-B} \rVert . Therefore, if as B tends to A, \lVert {B^{-1}-A^{-1}} \rVert tends to 0, and so the inverse operation is a continuous function at A.
Matrices
As we previously noted, once we fix a basis in a finite dimensional vector space X, we can represent a vector of X as an n-tuple of numbers, that is a vector in {\mathbb{R}}^n. The same thing can be done with L(X,Y), which brings us to matrices, which are a convenient way to represent finite-dimensional linear transformations. Suppose \{ x_1, x_2, \ldots, x_n \} and \{ y_1, y_2, \ldots, y_m \} are bases for vector spaces X and Y respectively. A linear operator is determined by its values on the basis. Given A \in L(X,Y), A x_j is an element of Y. Therefore, define the numbers \{ a_{i,j} \} as follows A x_j = \sum_{i=1}^m a_{i,j} \, y_i , and write them as a matrix A = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{bmatrix} . And we say A is an m-by-n matrix. The columns of the matrix are precisely the coefficients that represent A x_j. Let us derive the familiar rule for matrix multiplication.
When z = \sum_{j=1}^n c_j \, x_j , then A z = \sum_{j=1}^n c_j \, A x_j = \sum_{j=1}^n c_j \left( \sum_{i=1}^m a_{i,j}\, y_i \right) = \sum_{i=1}^m \left(\sum_{j=1}^n a_{i,j}\, c_j \right) y_i , which gives rise to the familiar rule for matrix multiplication.
There is a one-to-one correspondence between matrices and linear operators in L(X,Y). That is, once we fix a basis in X and in Y. If we would choose a different basis, we would get different matrices. This is important, the operator A acts on elements of X, the matrix is something that works with n-tuples of numbers, that is, vectors of {\mathbb{R}}^n.
If B is an n-by-r matrix with entries b_{j,k}, then the matrix for C = AB is an m-by-r matrix whose i,kth entry c_{i,k} is c_{i,k} = \sum_{j=1}^n a_{i,j}\,b_{j,k} . A way to remember it is if you order the indices as we do, that is row,column, and put the elements in the same order as the matrices, then it is the “middle index” that is “summed-out.”
A linear mapping changing one basis to another is a square matrix in which the columns represent basis elements of the second basis in terms of the first basis. We call such a linear mapping an change of basis.
Suppose all the bases are just the standard bases and X={\mathbb{R}}^n and Y={\mathbb{R}}^m. Recall the Cauchy-Schwarz inequality and compute \lVert {Az} \rVert^2 = \sum_{i=1}^m { \left(\sum_{j=1}^n a_{i,j} c_j \right)}^2 \leq \sum_{i=1}^m { \left(\sum_{j=1}^n {(c_j)}^2 \right) \left(\sum_{j=1}^n {(a_{i,j})}^2 \right) } = \sum_{i=1}^m \left(\sum_{j=1}^n {(a_{i,j})}^2 \right) \lVert {z} \rVert^2 . In other words, we have a bound on the operator norm (note that equality rarely happens) \lVert {A} \rVert \leq \sqrt{\sum_{i=1}^m \sum_{j=1}^n {(a_{i,j})}^2} . If the entries go to zero, then \lVert {A} \rVert goes to zero. In particular, if A is fixed and B is changing such that the entries of A-B go to zero, then B goes to A in operator norm. That is, B goes to A in the metric space topology induced by the operator norm. We proved the first part of:
If f \colon S \to {\mathbb{R}}^{nm} is a continuous function for a metric space S, then taking the components of f as the entries of a matrix, f is a continuous mapping from S to L({\mathbb{R}}^n,{\mathbb{R}}^m). Conversely, if f \colon S \to L({\mathbb{R}}^n,{\mathbb{R}}^m) is a continuous function, then the entries of the matrix are continuous functions.
The proof of the second part is rather easy. Take f(x) e_j and note that is a continuous function to {\mathbb{R}}^m with standard Euclidean norm: \lVert {f(x) e_j - f(y) e_j} \rVert = \lVert {\bigl(f(x)- f(y) \bigr) e_j} \rVert \leq \lVert {f(x)- f(y)} \rVert, so as x \to y, then \lVert {f(x)- f(y)} \rVert \to 0 and so \lVert {f(x) e_j - f(y) e_j} \rVert \to 0. Such a function is continuous if and only if its components are continuous and these are the components of the jth column of the matrix f(x).
Determinants
A certain number can be assigned to square matrices that measures how the corresponding linear mapping stretches space. In particular, this number, called the determinant, can be used to test for invertibility of a matrix.
First define the symbol \operatorname{sgn}(x) for a number is defined by \operatorname{sgn}(x) := \begin{cases} -1 & \text{ if $x < 0$} , \\ 0 & \text{ if $x = 0$} , \\ 1 & \text{ if $x > 0$} . \end{cases} Suppose \sigma = (\sigma_1,\sigma_2,\ldots,\sigma_n) is a permutation of the integers (1,2,\ldots,n), that is, a reordering of (1,2,\ldots,n). Any permutation can be obtained by a sequence of transpositions (switchings of two elements). Call a permutation even (resp. odd) if it takes an even (resp. odd) number of transpositions to get from \sigma to (1,2,\ldots,n). It can be shown that this is well defined (exercise). In fact, define \label{eq:sgndef} \operatorname{sgn}(\sigma) := \operatorname{sgn}(\sigma_1,\ldots,\sigma_n) = \prod_{p < q} \operatorname{sgn}(\sigma_q-\sigma_p) . Then it can be shown that \operatorname{sgn}(\sigma) is 1 if \sigma is even and -1 if \sigma is odd. This fact can be proved by noting that applying a transposition changes the sign. Then note that the sign of (1,2,\ldots,n) is 1.
Let S_n be the set of all permutations on n elements (the symmetric group). Let A= [a_{i,j}] be a square n \times n matrix. Define the determinant of A \det(A) := \sum_{\sigma \in S_n} \operatorname{sgn} (\sigma) \prod_{i=1}^n a_{i,\sigma_i} .
- [prop:det:i] \det(I) = 1.
- [prop:det:ii] \det([x_1 ~~ x_2 ~~ \cdots ~~ x_n ]) as a function of column vectors x_j is linear in each variable x_j separately.
- [prop:det:iii] If two columns of a matrix are interchanged, then the determinant changes sign.
- [prop:det:iv] If two columns of A are equal, then \det(A) = 0.
- [prop:det:v] If a column is zero, then \det(A) = 0.
- [prop:det:vi] A \mapsto \det(A) is a continuous function.
[prop:det:vii] \det\left[\begin{smallmatrix} a & b \\ c &d \end{smallmatrix}\right] = ad-bc, and \det [a] = a.
In fact, the determinant is the unique function that satisfies [prop:det:i], [prop:det:ii], and [prop:det:iii]. But we digress. By [prop:det:ii], we mean that if we fix all the vectors x_1,\ldots,x_n except for x_j and think of the determinant as function of x_j, it is a linear function. That is, if v,w \in {\mathbb{R}}^n are two vectors, and a,b \in {\mathbb{R}} are scalars, then \begin{gathered} \det([x_1 ~~ \cdots ~~ x_{j-1} ~~ (av+bw) ~~ x_{j+1} ~~ \cdots ~~ x_n]) = \\ a \det([x_1 ~~ \cdots ~~ x_{j-1} ~~ v ~~ x_{j+1} ~~ \cdots ~~ x_n]) + b \det([x_1 ~~ \cdots ~~ x_{j-1} ~~ w ~~ x_{j+1} ~~ \cdots ~~ x_n]) .\end{gathered}
We go through the proof quickly, as you have likely seen this before.
[prop:det:i] is trivial. For [prop:det:ii], notice that each term in the definition of the determinant contains exactly one factor from each column.
Part [prop:det:iii] follows by noting that switching two columns is like switching the two corresponding numbers in every element in S_n. Hence all the signs are changed. Part [prop:det:iv] follows because if two columns are equal and we switch them we get the same matrix back and so part [prop:det:iii] says the determinant must have been 0.
Part [prop:det:v] follows because the product in each term in the definition includes one element from the zero column. Part [prop:det:vi] follows as \det is a polynomial in the entries of the matrix and hence continuous. We have seen that a function defined on matrices is continuous in the operator norm if it is continuous in the entries. Finally, part [prop:det:vii] is a direct computation.
The determinant tells us about areas and volumes, and how they change. For example, in the 1 \times 1 case, a matrix is just a number, and the determinant is exactly this number. It says how the linear mapping “stretches” the space. Similarly for {\mathbb{R}}^2 (and in fact for {\mathbb{R}}^n). Suppose A \in L({\mathbb{R}}^2) is a linear transformation. It can be checked directly that the area of the image of the unit square A([0,1]^2) is precisely \left\lvert {\det(A)} \right\rvert. The sign of the determinant tells us if the image is flipped or not. This works with arbitrary figures, not just the unit square. The determinant tells us the stretch in the area. In {\mathbb{R}}^3 it will tell us about the 3 dimensional volume, and in n-dimensions about the n-dimensional volume. We claim this without proof.
If A and B are n\times n matrices, then \det(AB) = \det(A)\det(B). In particular, A is invertible if and only if \det(A) \not= 0 and in this case, \det(A^{-1}) = \frac{1}{\det(A)}.
Let b_1,b_2,\ldots,b_n be the columns of B. Then AB = [ Ab_1 \quad Ab_2 \quad \cdots \quad Ab_n ] . That is, the columns of AB are Ab_1,Ab_2,\ldots,Ab_n.
Let b_{j,k} denote the elements of B and a_j the columns of A. Note that Ae_j = a_j. By linearity of the determinant as proved above we have \begin{split} \det(AB) & = \det ([ Ab_1 \quad Ab_2 \quad \cdots \quad Ab_n ]) = \det \left(\left[ \sum_{j=1}^n b_{j,1} a_j \quad Ab_2 \quad \cdots \quad Ab_n \right]\right) \\ & = \sum_{j=1}^n b_{j,1} \det ([ a_j \quad Ab_2 \quad \cdots \quad Ab_n ]) \\ & = \sum_{1 \leq j_1,j_2,\ldots,j_n \leq n} b_{j_1,1} b_{j_2,2} \cdots b_{j_n,n} \det ([ a_{j_1} \quad a_{j_2} \quad \cdots \quad a_{j_n} ]) \\ & = \left( \sum_{(j_1,j_2,\ldots,j_n) \in S_n} b_{j_1,1} b_{j_2,2} \cdots b_{j_n,n} \operatorname{sgn}(j_1,j_2,\ldots,j_n) \right) \det ([ a_{1} \quad a_{2} \quad \cdots \quad a_{n} ]) . \end{split} In the above, go from all integers between 1 and n, to just elements of S_n by noting that when two columns in the determinant are the same, then the determinant is zero. We then reorder the columns to the original ordering and obtain the sgn.
The conclusion that \det(AB) = \det(A)\det(B) follows by recognizing the determinant of B. We obtain this by plugging in A=I. The expression we got for the determinant of B has rows and columns swapped, so as a side note, we have also just proved that the determinant of a matrix and its transpose are equal.
To prove the second part of the theorem, suppose A is invertible. Then A^{-1}A = I and consequently \det(A^{-1})\det(A) = \det(A^{-1}A) = \det(I) = 1. If A is not invertible, then the columns are linearly dependent. That is, suppose \sum_{j=1}^n \gamma_j a_j = 0 , where not all \gamma_j are equal to 0. Without loss of generality suppose \gamma_1\neq 1. Take B := \begin{bmatrix} \gamma_1 & 0 & 0 & \cdots & 0 \\ \gamma_2 & 1 & 0 & \cdots & 0 \\ \gamma_3 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \gamma_n & 0 & 0 & \cdots & 1 \end{bmatrix} . Applying the definition of the determinant we see \det(B) = \gamma_1 \not= 0. Then \det(AB) = \det(A)\det(B) = \gamma_1\det(A). The first column of AB is zero, and hence \det(AB) = 0. Thus \det(A) = 0.
Determinant is independent of the basis. In other words, if B is invertible, then \det(A) = \det(B^{-1}AB) .
Proof follows by noting \det(B^{-1}AB) = \frac{1}{\det(B)}\det(A)\det(B) = \det(A). If in one basis A is the matrix representing a linear operator, then for another basis we can find a matrix B such that the matrix B^{-1}AB takes us to the first basis, applies A in the first basis, and takes us back to the basis we started with. We choose a basis on X, and we represent a linear mapping using a matrix with respect to this basis. We obtain the same determinant as if we had used any other basis. It follows that \det \colon L(X) \to {\mathbb{R}} is a well-defined function (not just on matrices).
There are three types of so-called elementary matrices. Recall again that e_j are the standard basis of {\mathbb{R}}^n. First for some j = 1,2,\ldots,n and some \lambda \in {\mathbb{R}}, \lambda \neq 0, an n \times n matrix E defined by Ee_i = \begin{cases} e_i & \text{if $i \neq j$} , \\ \lambda e_i & \text{if $i = j$} . \end{cases} Given any n \times m matrix M the matrix EM is the same matrix as M except with the kth row multiplied by \lambda. It is an easy computation (exercise) that \det(E) = \lambda.
Second, for some j and k with j\neq k, and \lambda \in {\mathbb{R}} an n \times n matrix E defined by Ee_i = \begin{cases} e_i & \text{if $i \neq j$} , \\ e_i + \lambda e_k & \text{if $i = j$} . \end{cases} Given any n \times m matrix M the matrix EM is the same matrix as M except with \lambda times the kth row added to the jth row. It is an easy computation (exercise) that \det(E) = 1.
Finally, for some j and k with j\neq k an n \times n matrix E defined by Ee_i = \begin{cases} e_i & \text{if $i \neq j$ and $i \neq k$} , \\ e_k & \text{if $i = j$} , \\ e_j & \text{if $i = k$} . \end{cases} Given any n \times m matrix M the matrix EM is the same matrix with jth and kth rows swapped. It is an easy computation (exercise) that \det(E) = -1.
Elementary matrices are useful for computing the determinant. The proof of the following proposition is left as an exercise.
[prop:elemmatrixdecomp] Let T be an n \times n invertible matrix. Then there exists a finite sequence of elementary matrices E_1, E_2, \ldots, E_k such that T = E_1 E_2 \cdots E_k , and \det(T) = \det(E_1)\det(E_2)\cdots \det(E_k) .
Exercises
If X is a vector space with a norm \lVert {\cdot} \rVert, then show that d(x,y) := \lVert {x-y} \rVert makes X a metric space.
Show that for square matrices A and B, \det(AB) = \det(BA).
For {\mathbb{R}}^n define \lVert {x} \rVert_{\infty} := \max \{ \left\lvert {x_1} \right\rvert, \left\lvert {x_2} \right\rvert, \ldots, \left\lvert {x_n} \right\rvert \} , sometimes called the sup or the max norm.
a) Show that \lVert {\cdot} \rVert_\infty is a norm on {\mathbb{R}}^n (defining a different distance).
b) What is the unit ball B(0,1) in this norm?
For {\mathbb{R}}^n define \lVert {x} \rVert_{1} := \sum_{j=1}^n \lvert {x_j} \rvert, sometimes called the 1-norm (or L^1 norm).
a) Show that \lVert {\cdot} \rVert_1 is a norm on {\mathbb{R}}^n (defining a different distance, sometimes called the taxicab distance).
b) What is the unit ball B(0,1) in this norm?
Using the euclidean norm on {\mathbb{R}}^2. Compute the operator norm of the operators in L({\mathbb{R}}^2) given by the matrices:
a) \left[
\begin{smallmatrix}
1 & 0 \\
0 & 2
\end{smallmatrix}
\right] b) \left[
\begin{smallmatrix}
0 & 1 \\
-1 & 0
\end{smallmatrix}
\right] c) \left[
\begin{smallmatrix}
1 & 1 \\
0 & 1
\end{smallmatrix}
\right] d) \left[
\begin{smallmatrix}
0 & 1 \\
0 & 0
\end{smallmatrix}
\right]
[exercise:normonedim] Using the standard euclidean norm {\mathbb{R}}^n. Show
a) Suppose A \in L({\mathbb{R}},{\mathbb{R}}^n) is defined for x \in {\mathbb{R}} by Ax = xa for a vector a \in {\mathbb{R}}^n. Then the operator norm \(\lVert {A} \rVert_{L({\mathbb{R}},{\mathbb{R}}^n)} = \lVert {a} \rVert_
Callstack: at (Bookshelves/Analysis/Introduction_to_Real_Analysis_(Lebl)/10:_One_dimensional_integrals_in_several_variables/10.04:_temp), /content/body/div[4]/p[6]/span[2], line 1, column 1 at wiki.page() at (Under_Construction/Purgatory/Remixer_University/Username:_junalyn2020/Book:_Introduction_to_Real_Analysis_(Lebl)/9:_One_dimensional_integrals_in_several_variables/temp), /content/body/div[1]/pre, line 2, column 14
Suppose \sigma = (\sigma_1,\sigma_2,\ldots,\sigma_n) is a permutation of (1,2,\ldots,n).
a) Show that we can make a finite number of transpositions (switching of two elements) to get to (1,2,\ldots,n).
b) Using the definition [eq:sgndef] show that \sigma is even if \operatorname{sgn}(\sigma) = 1 and \sigma is odd if \operatorname{sgn}(\sigma) = -1. In particular, showing that being odd or even is well defined.
Verify the computation of the determinant for the three types of elementary matrices.
Prove .
a) Suppose D = [d_{i,j}] is an n-by-n diagonal matrix, that is, d_{i,j} = 0 whenever i
\not= j. Show that \det(D) = d_{1,1}d_{2,2} \cdots d_{n,n}.
b) Suppose A is a diagonalizable matrix. That is, there exists a matrix B such that B^{-1}AB = D for a diagonal matrix D = [d_{i,j}]. Show that \det(A) = d_{1,1}d_{2,2} \cdots d_{n,n}.
Take the vectorspace of polynomials {\mathbb{R}}[t] and the linear operator D \in
L({\mathbb{R}}[t]) that is the differentiation (we proved in an earlier exercise that D is a linear operator). Define the norm on P(t) = c_0 + c_1 t + \cdots + c_n
t^n as \lVert {P} \rVert := \sup \{ \left\lvert {c_j} \right\rvert : j = 0,1,\ldots,n \}.
a) Show that \lVert {P} \rVert is a norm on {\mathbb{R}}[t].
b) Show that D does not have bounded operator norm, that is \lVert {D} \rVert =
\infty. Hint: consider the polynomials t^n as n tends to infinity.
In this exercise we finish the proof of . Let X be any finite dimensional vector space with a norm. Let \{ x_1,x_2,\ldots,x_n
\} be a basis for X.
a) Show that the function f \colon {\mathbb{R}}^n \to {\mathbb{R}} f(c_1,c_2,\ldots,c_n) =
\lVert {c_1 x_1 + c_2 x_2 + \cdots + c_n x_n} \rVert is continuous.
b) Show that there exists numbers m and M such that if c = (c_1,c_2,\ldots,c_n) \in {\mathbb{R}}^n with \lVert {c} \rVert = 1 (standard euclidean norm), then m \leq \lVert {c_1 x_1 + c_2 x_2 + \cdots + c_n x_n} \rVert \leq M (here the norm is on X).
c) Show that there exists a number B such that if \lVert {c_1 x_1 + c_2 x_2 + \cdots + c_n x_n} \rVert=1, then \left\lvert {c_j} \right\rvert \leq B.
d) Use part (c) to show that if X and Y are finite dimensional vector spaces and A \in L(X,Y), then \lVert {A} \rVert < \infty.
Let X be any finite dimensional vector space with a norm \lVert {\cdot} \rVert and basis \{ x_1,x_2,\ldots,x_n
\}. Let c = (c_1,\ldots,c_n) \in {\mathbb{R}}^n and \lVert {c} \rVert be the standard euclidean norm on {\mathbb{R}}^n.
a) Find that there exist positive numbers m,M > 0 such that for m \lVert {c} \rVert
\leq
\lVert {c_1 x_1 + c_2 x_2 + \cdots + c_n x_n} \rVert
\leq
M \lVert {c} \rVert . Hint: See previous exercise.
b) Use part (a) to show that of \lVert {\cdot} \rVert_1 and \lVert {\cdot} \rVert_2 are two norms on X, then there exist positive numbers m,M > 0 (perhaps different than above) such that for all x \in X we have m \lVert {x} \rVert_1
\leq
\lVert {x} \rVert_2
\leq
M \lVert {x} \rVert_1 . c) Now show that U \subset X is open in the metric defined by \left\lVert {x-y} \right\rVert_1 if and only if it is open in the metric defined by \left\lVert {x-y} \right\rVert_2. In other words, convergence of sequences, continuity of functions is the same in either norm.
The derivative
Note: 2–3 lectures
The derivative
Recall that for a function f \colon {\mathbb{R}}\to {\mathbb{R}}, we defined the derivative at x as \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} . In other words, there was a number a (the derivative of f at x) such that \lim_{h \to 0} \left\lvert {\frac{f(x+h)-f(x)}{h} - a} \right\rvert = \lim_{h \to 0} \left\lvert {\frac{f(x+h)-f(x) - ah}{h}} \right\rvert = \lim_{h \to 0} \frac{\left\lvert {f(x+h)-f(x) - ah} \right\rvert}{\left\lvert {h} \right\rvert} = 0.
Multiplying by a is a linear map in one dimension. That is, we think of a \in L({\mathbb{R}}^1,{\mathbb{R}}^1) which is the best linear approximation of f near x. We use this definition to extend differentiation to more variables.
Let U \subset {\mathbb{R}}^n be an open subset and f \colon U \to {\mathbb{R}}^m. We say f is differentiable at x \in U if there exists an A \in L({\mathbb{R}}^n,{\mathbb{R}}^m) such that \lim_{\substack{h \to 0\\h\in {\mathbb{R}}^n}} \frac{\lVert {f(x+h)-f(x) - Ah} \rVert}{\lVert {h} \rVert} = 0 . We write Df(x) := A, or f'(x) := A, and we say A is the derivative of f at x. When f is differentiable at all x \in U, we say simply that f is differentiable.
For a differentiable function, the derivative of f is a function from U to L({\mathbb{R}}^n,{\mathbb{R}}^m). Compare to the one dimensional case, where the derivative is a function from U to {\mathbb{R}}, but we really want to think of {\mathbb{R}} here as L({\mathbb{R}}^1,{\mathbb{R}}^1).
The norms above must be on the right spaces of course. The norm in the numerator is on {\mathbb{R}}^m, and the norm in the denominator is on {\mathbb{R}}^n where h lives. Normally it is understood that h \in {\mathbb{R}}^n from context. We will not explicitly say so from now on.
We have again cheated somewhat and said that A is the derivative. We have not shown yet that there is only one, let us do that now.
Let U \subset {\mathbb{R}}^n be an open subset and f \colon U \to {\mathbb{R}}^m. Suppose x \in U and there exist A,B \in L({\mathbb{R}}^n,{\mathbb{R}}^m) such that \lim_{h \to 0} \frac{\lVert {f(x+h)-f(x) - Ah} \rVert}{\lVert {h} \rVert} = 0 \qquad \text{and} \qquad \lim_{h \to 0} \frac{\lVert {f(x+h)-f(x) - Bh} \rVert}{\lVert {h} \rVert} = 0 . Then A=B.
\begin{split} \frac{\lVert {(A-B)h} \rVert}{\lVert {h} \rVert} & = \frac{\lVert {f(x+h)-f(x) - Ah - (f(x+h)-f(x) - Bh)} \rVert}{\lVert {h} \rVert} \\ & \leq \frac{\lVert {f(x+h)-f(x) - Ah} \rVert}{\lVert {h} \rVert} + \frac{\lVert {f(x+h)-f(x) - Bh} \rVert}{\lVert {h} \rVert} . \end{split}
So \frac{\lVert {(A-B)h} \rVert}{\lVert {h} \rVert} \to 0 as h \to 0. That is, given \epsilon > 0, then for all h in some \delta-ball around the origin \epsilon > \frac{\lVert {(A-B)h} \rVert}{\lVert {h} \rVert} = \left\lVert {(A-B)\frac{h}{\lVert {h} \rVert}} \right\rVert . For any x with \lVert {x} \rVert=1, let h = (\nicefrac{\delta}{2}) \, x, then \lVert {h} \rVert < \delta and \frac{h}{\lVert {h} \rVert} = x. So \lVert {(A-B)x} \rVert < \epsilon. Taking the supremum over all x with \lVert {x} \rVert = 1 we get the operator norm \lVert {A-B} \rVert \leq \epsilon. As \epsilon > 0 was arbitrary \lVert {A-B} \rVert = 0 or in other words A = B.
If f(x) = Ax for a linear mapping A, then f'(x) = A. This is easily seen: \frac{\lVert {f(x+h)-f(x) - Ah} \rVert}{\lVert {h} \rVert} = \frac{\lVert {A(x+h)-Ax - Ah} \rVert}{\lVert {h} \rVert} = \frac{0}{\lVert {h} \rVert} = 0 .
Let f \colon {\mathbb{R}}^2 \to {\mathbb{R}}^2 be defined by f(x,y) = \bigl(f_1(x,y),f_2(x,y)\bigr) := (1+x+2y+x^2,2x+3y+xy). Let us show that f is differentiable at the origin and let us compute the derivative, directly using the definition. The derivative is in L({\mathbb{R}}^2,{\mathbb{R}}^2) so it can be represented by a 2\times 2 matrix \left[\begin{smallmatrix}a&b\\c&d\end{smallmatrix}\right]. Suppose h = (h_1,h_2). We need the following expression to go to zero. \begin{gathered} \frac{\lVert { f(h_1,h_2)-f(0,0) - (ah_1 +bh_2 , ch_1+dh_2)} \rVert }{\lVert {(h_1,h_2)} \rVert} = \\ \frac{\sqrt{ {\bigl((1-a)h_1 + (2-b)h_2 + h_1^2\bigr)}^2 + {\bigl((2-c)h_1 + (3-d)h_2 + h_1h_2\bigr)}^2}}{\sqrt{h_1^2+h_2^2}} .\end{gathered} If we choose a=1, b=2, c=2, d=3, the expression becomes \frac{\sqrt{ h_1^4 + h_1^2h_2^2}}{\sqrt{h_1^2+h_2^2}} = \left\lvert {h_1} \right\rvert \frac{\sqrt{ h_1^2 + h_2^2}}{\sqrt{h_1^2+h_2^2}} = \left\lvert {h_1} \right\rvert . And this expression does indeed go to zero as h \to 0. Therefore the function is differentiable at the origin and the derivative can be represented by the matrix \left[\begin{smallmatrix}1&2\\2&3\end{smallmatrix}\right].
Let U \subset {\mathbb{R}}^n be open and f \colon U \to {\mathbb{R}}^m be differentiable at p \in U. Then f is continuous at p.
Another way to write the differentiability of f at p is to first write r(h) := f(p+h)-f(p) - f'(p) h , and \frac{\lVert {r(h)} \rVert}{\lVert {h} \rVert} must go to zero as h \to 0. So r(h) itself must go to zero. The mapping h \mapsto f'(p) h is a linear mapping between finite dimensional spaces, it is therefore continuous and goes to zero as h \to 0. Therefore, f(p+h) must go to f(p) as h \to 0. That is, f is continuous at p.
Let U \subset {\mathbb{R}}^n be open and let f \colon U \to {\mathbb{R}}^m be differentiable at p \in U. Let V \subset {\mathbb{R}}^m be open, f(U) \subset V and let g \colon V \to {\mathbb{R}}^\ell be differentiable at f(p). Then F(x) = g\bigl(f(x)\bigr) is differentiable at p and F'(p) = g'\bigl(f(p)\bigr) f'(p) .
Without the points where things are evaluated, this is sometimes written as F' = {(f \circ g)}' = g' f'. The way to understand it is that the derivative of the composition g \circ f is the composition of the derivatives of g and f. That is, if f'(p) = A and g'\bigl(f(p)\bigr) = B, then F'(p) = BA.
Let A := f'(p) and B := g'\bigl(f(p)\bigr). Take h \in {\mathbb{R}}^n and write q = f(p), k = f(p+h)-f(p). Let r(h) := f(p+h)-f(p) - A h . %= k - Ah. Then r(h) = k-Ah or Ah = k-r(h). We look at the quantity we need to go to zero: \begin{split} \frac{\lVert {F(p+h)-F(p) - BAh} \rVert}{\lVert {h} \rVert} & = \frac{\lVert {g\bigl(f(p+h)\bigr)-g\bigl(f(p)\bigr) - BAh} \rVert}{\lVert {h} \rVert} \\ & = \frac{\lVert {g(q+k)-g(q) - B\bigl(k-r(h)\bigr)} \rVert}{\lVert {h} \rVert} \\ %& = %\frac %{\snorm{g(q+k)-g(q) - B\bigl(k-r(h)\bigr)}} %{\snorm{k}} %\frac %{\snorm{f(p+h)-f(p)}} %{\snorm{h}} %\\ & \leq \frac {\lVert {g(q+k)-g(q) - Bk} \rVert} {\lVert {h} \rVert} + \lVert {B} \rVert \frac {\lVert {r(h)} \rVert} {\lVert {h} \rVert} \\ & = \frac {\lVert {g(q+k)-g(q) - Bk} \rVert} {\lVert {k} \rVert} \frac {\lVert {f(p+h)-f(p)} \rVert} {\lVert {h} \rVert} + \lVert {B} \rVert \frac {\lVert {r(h)} \rVert} {\lVert {h} \rVert} . \end{split} First, \lVert {B} \rVert is constant and f is differentiable at p, so the term \lVert {B} \rVert\frac{\lVert {r(h)} \rVert}{\lVert {h} \rVert} goes to 0. Next as f is continuous at p, we have that as h goes to 0, then k goes to 0. Therefore \frac {\lVert {g(q+k)-g(q) - Bk} \rVert} {\lVert {k} \rVert} goes to 0 because g is differentiable at q. Finally \frac {\lVert {f(p+h)-f(p)} \rVert} {\lVert {h} \rVert} \leq \frac {\lVert {f(p+h)-f(p)-Ah} \rVert} {\lVert {h} \rVert} + \frac {\lVert {Ah} \rVert} {\lVert {h} \rVert} \leq \frac {\lVert {f(p+h)-f(p)-Ah} \rVert} {\lVert {h} \rVert} + \lVert {A} \rVert . As f is differentiable at p, for small enough h {\lVert {f(p+h)-f(p)-Ah} \rVert} {\lVert {h} \rVert} is bounded. Therefore the term \frac {\lVert {f(p+h)-f(p)} \rVert} {\lVert {h} \rVert} stays bounded as h goes to 0. Therefore, \frac{\lVert {F(p+h)-F(p) - BAh} \rVert}{\lVert {h} \rVert} goes to zero, and F'(p) = BA, which is what was claimed.
Partial derivatives
There is another way to generalize the derivative from one dimension. We hold all but one variable constant and take the regular derivative.
Let f \colon U \to {\mathbb{R}} be a function on an open set U \subset {\mathbb{R}}^n. If the following limit exists we write \frac{\partial f}{\partial x_j} (x) := \lim_{h\to 0}\frac{f(x_1,\ldots,x_{j-1},x_j+h,x_{j+1},\ldots,x_n)-f(x)}{h} = \lim_{h\to 0}\frac{f(x+h e_j)-f(x)}{h} . We call \frac{\partial f}{\partial x_j} (x) the partial derivative of f with respect to x_j. Sometimes we write D_j f instead.
For a mapping f \colon U \to {\mathbb{R}}^m we write f = (f_1,f_2,\ldots,f_m), where f_k are real-valued functions. Then we define \frac{\partial f_k}{\partial x_j} (or write it as D_j f_k).
Partial derivatives are easier to compute with all the machinery of calculus, and they provide a way to compute the derivative of a function.
[mv:prop:jacobianmatrix] Let U \subset {\mathbb{R}}^n be open and let f \colon U \to {\mathbb{R}}^m be differentiable at p \in U. Then all the partial derivatives at p exist and in terms of the standard basis of {\mathbb{R}}^n and {\mathbb{R}}^m, f'(p) is represented by the matrix \begin{bmatrix} \frac{\partial f_1}{\partial x_1}(p) & \frac{\partial f_1}{\partial x_2}(p) & \ldots & \frac{\partial f_1}{\partial x_n}(p) \\ \frac{\partial f_2}{\partial x_1}(p) & \frac{\partial f_2}{\partial x_2}(p) & \ldots & \frac{\partial f_2}{\partial x_n}(p) \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1}(p) & \frac{\partial f_m}{\partial x_2}(p) & \ldots & \frac{\partial f_m}{\partial x_n}(p) \end{bmatrix} .
In other words f'(p) \, e_j = \sum_{k=1}^m \frac{\partial f_k}{\partial x_j}(p) \,e_k . If v = \sum_{j=1}^n c_j e_j = (c_1,c_2,\ldots,c_n), then f'(p) \, v = \sum_{j=1}^n \sum_{k=1}^m c_j \frac{\partial f_k}{\partial x_j}(p) \,e_k = \sum_{k=1}^m \left( \sum_{j=1}^n c_j \frac{\partial f_k}{\partial x_j}(p) \right) \,e_k .
Fix a j and note that \begin{split} \left\lVert {\frac{f(p+h e_j)-f(p)}{h} - f'(p) e_j} \right\rVert & = \left\lVert {\frac{f(p+h e_j)-f(p) - f'(p) h e_j}{h}} \right\rVert \\ & = \frac{\lVert {f(p+h e_j)-f(p) - f'(p) h e_j} \rVert}{\lVert {h e_j} \rVert} . \end{split} As h goes to 0, the right hand side goes to zero by differentiability of f, and hence \lim_{h \to 0} \frac{f(p+h e_j)-f(p)}{h} = f'(p) e_j . Note that f is vector valued. So represent f by components f = (f_1,f_2,\ldots,f_m), and note that taking a limit in {\mathbb{R}}^m is the same as taking the limit in each component separately. Therefore for any k the partial derivative \frac{\partial f_k}{\partial x_j} (p) = \lim_{h \to 0} \frac{f_k(p+h e_j)-f_k(p)}{h} exists and is equal to the kth component of f'(p) e_j, and we are done.
The converse of the proposition is not true. Just because the partial derivatives exist, does not mean that the function is differentiable. See the exercises. However, when the partial derivatives are continuous, we will prove that the converse holds. One of the consequences of the proposition is that if f is differentiable on U, then f' \colon U \to L({\mathbb{R}}^n,{\mathbb{R}}^m) is a continuous function if and only if all the \frac{\partial f_k}{\partial x_j} are continuous functions.
Gradient and directional derivatives
Let U \subset {\mathbb{R}}^n be open and f \colon U \to {\mathbb{R}} is a differentiable function. We define the gradient as \nabla f (x) := \sum_{j=1}^n \frac{\partial f}{\partial x_j} (x)\, e_j . Notice that the gradient gives us a way to represent the action of the derivative as a dot product: f'(x)v = \nabla f(x) \cdot v.
Suppose \gamma \colon (a,b) \subset {\mathbb{R}}\to {\mathbb{R}}^n is a differentiable function and the image \gamma\bigl((a,b)\bigr) \subset U. Such a function and its image is sometimes called a curve, or a differentiable curve. Write \gamma = (\gamma_1,\gamma_2,\ldots,\gamma_n). Let g(t) := f\bigl(\gamma(t)\bigr) . The function g is differentiable. For purposes of computation we identify L({\mathbb{R}}^1) with {\mathbb{R}}, and hence g'(t) can be computed as a number: g'(t) = f'\bigl(\gamma(t)\bigr) \gamma^{\:\prime}(t) = \sum_{j=1}^n \frac{\partial f}{\partial x_j} \bigl(\gamma(t)\bigr) \frac{d\gamma_j}{dt} (t) = \sum_{j=1}^n \frac{\partial f}{\partial x_j} \frac{d\gamma_j}{dt} . For convenience, we sometimes leave out the points where we are evaluating as on the right hand side above. Let us rewrite this with the notation of the gradient and the dot product: g'(t) = (\nabla f) \bigl(\gamma(t)\bigr) \cdot \gamma^{\:\prime}(t) = \nabla f \cdot \gamma^{\:\prime} .
We use this idea to define derivatives in a specific direction. A direction is simply a vector pointing in that direction. So pick a vector u \in {\mathbb{R}}^n such that \lVert {u} \rVert = 1. Fix x \in U. Then define a curve \gamma(t) := x + tu . It is easy to compute that \gamma^{\:\prime}(t) = u for all t. By chain rule \frac{d}{dt}\Big|_{t=0} \bigl[ f(x+tu) \bigr] = (\nabla f) (x) \cdot u , where the notation \frac{d}{dt}\big|_{t=0} represents the derivative evaluated at t=0. We also compute directly \frac{d}{dt}\Big|_{t=0} \bigl[ f(x+tu) \bigr] = \lim_{h\to 0} \frac{f(x+hu)-f(x)}{h} . We obtain the directional derivative, denoted by D_u f (x) := \frac{d}{dt}\Big|_{t=0} \bigl[ f(x+tu) \bigr] , which can be computed by one of the methods above.
Let us suppose (\nabla f)(x) \neq 0. By Cauchy-Schwarz inequality we have \left\lvert {D_u f(x)} \right\rvert \leq \lVert {(\nabla f)(x)} \rVert . Equality is achieved when u is a scalar multiple of (\nabla f)(x). That is, when u = \frac{(\nabla f)(x)}{\lVert {(\nabla f)(x)} \rVert} , we get D_u f(x) = \lVert {(\nabla f)(x)} \rVert. The gradient points in the direction in which the function grows fastest, in other words, in the direction in which D_u f(x) is maximal.
The Jacobian
Let U \subset {\mathbb{R}}^n and f \colon U \to {\mathbb{R}}^n be a differentiable mapping. Then define the Jacobian, or Jacobian determinant 3, of f at x as J_f(x) := \det\bigl( f'(x) \bigr) . Sometimes this is written as \frac{\partial(f_1,f_2,\ldots,f_n)}{\partial(x_1,x_2,\ldots,x_n)} .
This last piece of notation may seem somewhat confusing, but it is useful when you need to specify the exact variables and function components used.
The Jacobian J_f is a real valued function, and when n=1 it is simply the derivative. From the chain rule and the fact that \det(AB) = \det(A)\det(B), it follows that: J_{f \circ g} (x) = J_f\bigl(g(x)\bigr) J_g(x) .
As we mentioned the determinant tells us what happens to area/volume. Similarly, the Jacobian measures how much a differentiable mapping stretches things locally, and if it flips orientation. In particular, if the Jacobian is non-zero than we would assume that locally the mapping is invertible (and we would be correct as we will later see).
Exercises
Suppose \gamma \colon (-1,1) \to {\mathbb{R}}^n and \alpha \colon (-1,1) \to {\mathbb{R}}^n be two differentiable curves such that \gamma(0) = \alpha(0) and \gamma^{\:\prime}(0) = \alpha'(0). Suppose F \colon {\mathbb{R}}^n \to {\mathbb{R}} is a differentiable function. Show that \frac{d}{dt}\Big|_{t=0} F\bigl(\gamma(t)\bigr) = \frac{d}{dt}\Big|_{t=0} F\bigl(\alpha(t)\bigr) .
Let f \colon {\mathbb{R}}^2 \to {\mathbb{R}} be given by f(x,y) = \sqrt{x^2+y^2}. Show that f is not differentiable at the origin.
Using only the definition of the derivative, show that the following f \colon {\mathbb{R}}^2 \to {\mathbb{R}}^2 are differentiable at the origin and find their derivative.
a) f(x,y) := (1+x+xy,x),
b) f(x,y) := (y-y^{10},x),
c) f(x,y) := \bigl( (x+y+1)^2 , (x-y+2)^2 \bigr).
Suppose f \colon {\mathbb{R}}\to {\mathbb{R}} and g \colon {\mathbb{R}}\to {\mathbb{R}} are differentiable functions. Using only the definition of the derivative, show that h \colon {\mathbb{R}}^2 \to {\mathbb{R}}^2 defined by h(x,y) := \bigl(f(x),g(y)\bigr) is a differentiable function and find the derivative at any point (x,y).
[exercise:noncontpartialsexist] Define a function f \colon {\mathbb{R}}^2 \to {\mathbb{R}} by f(x,y)
:=
\begin{cases}
\frac{xy}{x^2+y^2} & \text{ if $(x,y) \not= (0,0)$}, \\
0 & \text{ if $(x,y) = (0,0)$}.
\end{cases} a) Show that partial derivatives \frac{\partial f}{\partial x} and \frac{\partial f}{\partial y} exist at all points (including the origin).
b) Show that f is not continuous at the origin (and hence not differentiable).
Define a function f \colon {\mathbb{R}}^2 \to {\mathbb{R}} by f(x,y)
:=
\begin{cases}
\frac{x^2y}{x^2+y^2} & \text{ if $(x,y) \not= (0,0)$}, \\
0 & \text{ if $(x,y) = (0,0)$}.
\end{cases} a) Show that partial derivatives \frac{\partial f}{\partial x} and \frac{\partial f}{\partial y} exist at all points.
b) Show that for all u \in {\mathbb{R}}^2 with \lVert {u} \rVert=1, the directional derivative D_u f exists at all points.
c) Show that f is continuous at the origin.
d) Show that f is not differentiable at the origin.
Suppose f \colon {\mathbb{R}}^n \to {\mathbb{R}}^n is one-to-one, onto, differentiable at all points, and such that f^{-1} is also differentiable at all points.
a) Show that f'(p) is invertible at all points p and compute {(f^{-1})}'\bigl(f(p)\bigr). Hint: consider p = f^{-1}\bigl(f(p)\bigr).
b) Let g \colon {\mathbb{R}}^n \to {\mathbb{R}}^n be a function differentiable at q \in {\mathbb{R}}^n and such that g(q)=q. Suppose f(p) = q for some p \in {\mathbb{R}}^n. Show J_g(q) = J_{f^{-1} \circ g \circ f}(p) where J_g is the Jacobian determinant.
Suppose f \colon {\mathbb{R}}^2 \to {\mathbb{R}} is differentiable and such that f(x,y) = 0 if and only if y=0 and such that \nabla f(0,0) = (1,1). Prove that f(x,y) > 0 whenever y > 0, and f(x,y) < 0 whenever y < 0.
[exercise:mv:maximumcritical] Suppose U \subset {\mathbb{R}}^n is open and f \colon U \to {\mathbb{R}} is differentiable. Suppose f has a local maximum at p \in U. Show that f'(p) = 0, that is the zero mapping in L({\mathbb{R}}^n,{\mathbb{R}}). That is p is a critical point of f.
Suppose f \colon {\mathbb{R}}^2 \to {\mathbb{R}} is differentiable and suppose that whenever x^2+y^2 = 1, then f(x,y) = 0. Prove that there exists at least one point (x_0,y_0) such that \frac{\partial f}{\partial x}(x_0,y_0) = \frac{\partial f}{\partial y}(x_0,y_0) = 0.
Define f(x,y) := ( x-y^2 ) ( 2 y^2 - x). Show
a) (0,0) is a critical point, that is f'(0,0) = 0, that is the zero linear map in L({\mathbb{R}}^2,{\mathbb{R}}).
b) For every direction, that is (x,y) such that x^2+y^2=1 the “restriction of f to the line containing the points (0,0) and (x,y)”, that is a function g(t) := f(tx,ty) has a local maximum at t=0.
c) f does not have a local maximum at (0,0).
Suppose f \colon {\mathbb{R}}\to {\mathbb{R}}^n is differentiable and \lVert {f(t)} \rVert = 1 for all t (that is, we have a curve in the unit sphere). Then show that for all t, treating f' as a vector we have, f'(t) \cdot f(t) = 0.
Define f \colon {\mathbb{R}}^2 \to {\mathbb{R}}^2 by f(x,y) := \bigl(x,y+\varphi(x)\bigr) for some differentiable function \varphi of one variable. Show f is differentiable and find f'.
Continuity and the derivative
Note: 1–2 lectures
Bounding the derivative
Let us prove a “mean value theorem” for vector valued functions.
If \varphi \colon [a,b] \to {\mathbb{R}}^n is differentiable on (a,b) and continuous on [a,b], then there exists a t_0 \in (a,b) such that \lVert {\varphi(b)-\varphi(a)} \rVert \leq (b-a) \lVert {\varphi'(t_0)} \rVert .
By mean value theorem on the function \bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi(t) (the dot is the scalar dot product again) we obtain there is a t_0 \in (a,b) such that \bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi(b) - \bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi(a) = \lVert {\varphi(b)-\varphi(a)} \rVert^2 = (b-a) \bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi'(t_0) where we treat \varphi' as a simply a column vector of numbers by abuse of notation. Note that in this case, if we think of \varphi'(t) as simply a vector, then by , \lVert {\varphi'(t)} \rVert_{L({\mathbb{R}},{\mathbb{R}}^n)} = \lVert {\varphi'(t)} \rVert_{{\mathbb{R}}^n}. That is, the euclidean norm of the vector is the same as the operator norm of \varphi'(t).
By Cauchy-Schwarz inequality \lVert {\varphi(b)-\varphi(a)} \rVert^2 = (b-a)\bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi'(t_0) \leq (b-a) \lVert {\varphi(b)-\varphi(a)} \rVert \lVert {\varphi'(t_0)} \rVert . \qedhere
Recall that a set U is convex if whenever x,y \in U, the line segment from x to y lies in U.
[mv:prop:convexlip] Let U \subset {\mathbb{R}}^n be a convex open set, f \colon U \to {\mathbb{R}}^m a differentiable function, and an M such that \lVert {f'(x)} \rVert \leq M for all x \in U. Then f is Lipschitz with constant M, that is \lVert {f(x)-f(y)} \rVert \leq M \lVert {x-y} \rVert for all x,y \in U.
Fix x and y in U and note that (1-t)x+ty \in U for all t \in [0,1] by convexity. Next \frac{d}{dt} \Bigl[f\bigl((1-t)x+ty\bigr)\Bigr] = f'\bigl((1-t)x+ty\bigr) (y-x) . By the mean value theorem above we get for some t_0 \in (0,1) \lVert {f(x)-f(y)} \rVert \leq \left\lVert {\frac{d}{dt} \Big|_{t=t_0} \Bigl[ f\bigl((1-t)x+ty\bigr) \Bigr] } \right\rVert \leq \lVert {f'\bigl((1-t_0)x+t_0y\bigr)} \rVert \lVert {y-x} \rVert \leq M \lVert {y-x} \rVert . \qedhere
If U is not convex the proposition is not true. To see this fact, take the set U = \{ (x,y) : 0.9 < x^2+y^2 < 1.1 \} \setminus \{ (x,0) : x < 0 \} . Let f(x,y) be the angle that the line from the origin to (x,y) makes with the positive x axis. You can even write the formula for f: f(x,y) = 2 \operatorname{arctan}\left( \frac{y}{x+\sqrt{x^2+y^2}}\right) . Think spiral staircase with room in the middle. See .
The function is differentiable, and the derivative is bounded on U, which is not hard to see. Thinking of what happens near where the negative x-axis cuts the annulus in half, we see that the conclusion of the proposition cannot hold.
Let us solve the differential equation f' = 0.
If U \subset {\mathbb{R}}^n is connected and f \colon U \to {\mathbb{R}}^m is differentiable and f'(x) = 0, for all x \in U, then f is constant.
For any x \in U, there is a ball B(x,\delta) \subset U. The ball B(x,\delta) is convex. Since \lVert {f'(y)} \rVert \leq 0 for all y \in B(x,\delta), then by the theorem, \lVert {f(x)-f(y)} \rVert \leq 0 \lVert {x-y} \rVert = 0. So f(x) = f(y) for all y \in B(x,\delta).
This means that f^{-1}(c) is open for any c \in {\mathbb{R}}^m. Suppose f^{-1}(c) is nonempty. The two sets U' = f^{-1}(c), \qquad U'' = f^{-1}({\mathbb{R}}^m\setminus\{c\}) = \bigcup_{\substack{a \in {\mathbb{R}}^m\\a\neq c}} f^{-1}(a) are open disjoint, and further U = U' \cup U''. So as U' is nonempty, and U is connected, we have that U'' = \emptyset. So f(x) = c for all x \in U.
Continuously differentiable functions
We say f \colon U \subset {\mathbb{R}}^n \to {\mathbb{R}}^m is continuously differentiable, or C^1(U) if f is differentiable and f' \colon U \to L({\mathbb{R}}^n,{\mathbb{R}}^m) is continuous.
[mv:prop:contdiffpartials] Let U \subset {\mathbb{R}}^n be open and f \colon U \to {\mathbb{R}}^m. The function f is continuously differentiable if and only if all the partial derivatives exist and are continuous.
Without continuity the theorem does not hold. Just because partial derivatives exist does not mean that f is differentiable, in fact, f may not even be continuous. See the exercises for the last section and also for this section.
We have seen that if f is differentiable, then the partial derivatives exist. Furthermore, the partial derivatives are the entries of the matrix of f'(x). So if f' \colon U \to L({\mathbb{R}}^n,{\mathbb{R}}^m) is continuous, then the entries are continuous, hence the partial derivatives are continuous.
To prove the opposite direction, suppose the partial derivatives exist and are continuous. Fix x \in U. If we show that f'(x) exists we are done, because the entries of the matrix f'(x) are then the partial derivatives and if the entries are continuous functions, the matrix valued function f' is continuous.
Let us do induction on dimension. First let us note that the conclusion is true when n=1. In this case the derivative is just the regular derivative (exercise: you should check that the fact that the function is vector valued is not a problem).
Suppose the conclusion is true for {\mathbb{R}}^{n-1}, that is, if we restrict to the first n-1 variables, the conclusion is true. It is easy to see that the first n-1 partial derivatives of f restricted to the set where the last coordinate is fixed are the same as those for f. In the following we think of {\mathbb{R}}^{n-1} as a subset of {\mathbb{R}}^n, that is the set in {\mathbb{R}}^n where x_n = 0. Let A = \begin{bmatrix} \frac{\partial f_1}{\partial x_1}(x) & \ldots & \frac{\partial f_1}{\partial x_n}(x) \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1}(x) & \ldots & \frac{\partial f_m}{\partial x_n}(x) \end{bmatrix} , \qquad A_1 = \begin{bmatrix} \frac{\partial f_1}{\partial x_1}(x) & \ldots & \frac{\partial f_1}{\partial x_{n-1}}(x) \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1}(x) & \ldots & \frac{\partial f_m}{\partial x_{n-1}}(x) \end{bmatrix} , \qquad v = %\frac{\partial f}{\partial x_n}(x) = \begin{bmatrix} \frac{\partial f_1}{\partial x_n}(x) \\ \vdots \\ \frac{\partial f_m}{\partial x_n}(x) \end{bmatrix} . Let \epsilon > 0 be given. Let \delta > 0 be such that for any k \in {\mathbb{R}}^{n-1} with \lVert {k} \rVert < \delta we have \frac{\lVert {f(x+k) - f(x) - A_1k} \rVert}{\lVert {k} \rVert} < \epsilon . By continuity of the partial derivatives, suppose \delta is small enough so that \left\lvert {\frac{\partial f_j}{\partial x_n}(x+h) - \frac{\partial f_j}{\partial x_n}(x)} \right\rvert < \epsilon , for all j and all h with \lVert {h} \rVert < \delta.
Let h = h_1 + t e_n be a vector in {\mathbb{R}}^n where h_1 \in {\mathbb{R}}^{n-1} such that \lVert {h} \rVert < \delta. Then \lVert {h_1} \rVert \leq \lVert {h} \rVert < \delta. Note that Ah = A_1h_1 + tv. \begin{split} \lVert {f(x+h) - f(x) - Ah} \rVert & = \lVert {f(x+h_1 + t e_n) - f(x+h_1) - tv + f(x+h_1) - f(x) - A_1h_1} \rVert \\ & \leq \lVert {f(x+h_1 + t e_n) - f(x+h_1) -tv} \rVert + \lVert {f(x+h_1) - f(x) - A_1h_1} \rVert \\ & \leq \lVert {f(x+h_1 + t e_n) - f(x+h_1) -tv} \rVert + \epsilon \lVert {h_1} \rVert . \end{split} As all the partial derivatives exist, by the mean value theorem, for each j there is some \theta_j \in [0,t] (or [t,0] if t < 0), such that f_j(x+h_1 + t e_n) - f_j(x+h_1) = t \frac{\partial f_j}{\partial x_n}(x+h_1+\theta_j e_n). Note that if \lVert {h} \rVert < \delta, then \lVert {h_1+\theta_j e_n} \rVert \leq \lVert {h} \rVert < \delta. So to finish the estimate \begin{split} \lVert {f(x+h) - f(x) - Ah} \rVert & \leq \lVert {f(x+h_1 + t e_n) - f(x+h_1) -tv} \rVert + \epsilon \lVert {h_1} \rVert \\ & \leq \sqrt{\sum_{j=1}^m {\left(t\frac{\partial f_j}{\partial x_n}(x+h_1+\theta_j e_n) - t \frac{\partial f_j}{\partial x_n}(x)\right)}^2} + \epsilon \lVert {h_1} \rVert \\ & \leq \sqrt{m}\, \epsilon \left\lvert {t} \right\rvert + \epsilon \lVert {h_1} \rVert \\ & \leq (\sqrt{m}+1)\epsilon \lVert {h} \rVert . \end{split}
Exercises
Define f \colon {\mathbb{R}}^2 \to {\mathbb{R}} as f(x,y) := \begin{cases} (x^2+y^2)\sin\bigl({(x^2+y^2)}^{-1}\bigr) & \text{if $(x,y) \not= (0,0)$,} \\ 0 & \text{else.} \end{cases} Show that f is differentiable at the origin, but that it is not continuously differentiable.
Let f \colon {\mathbb{R}}^2 \to {\mathbb{R}} be the function from , that is, f(x,y) := \begin{cases} \frac{xy}{x^2+y^2} & \text{ if $(x,y) \not= (0,0)$}, \\ 0 & \text{ if $(x,y) = (0,0)$}. \end{cases} Compute the partial derivatives \frac{\partial f}{\partial x} and \frac{\partial f}{\partial y} at all points and show that these are not continuous functions.
Let B(0,1) \subset {\mathbb{R}}^2 be the unit ball (disc), that is, the set given by x^2 + y^2 < 1. Suppose f \colon B(0,1) \to {\mathbb{R}} is a differentiable function such that \left\lvert {f(0,0)} \right\rvert \leq 1, and \left\lvert {\frac{\partial f}{\partial x}} \right\rvert \leq 1 and \left\lvert {\frac{\partial f}{\partial y}} \right\rvert \leq 1 for all points in B(0,1).
a) Find an M \in {\mathbb{R}} such that \lVert {f'(x,y)} \rVert \leq M for all (x,y) \in
B(0,1).
b) Find a B \in {\mathbb{R}} such that \left\lvert {f(x,y)} \right\rvert \leq B for all (x,y) \in
B(0,1).
Define \varphi \colon [0,2\pi] \to {\mathbb{R}}^2 by \varphi(t) = \bigl(\sin(t),\cos(t)\bigr). Compute \varphi'(t) for all t. Compute \lVert {\varphi'(t)} \rVert for all t. Notice that \varphi'(t) is never zero, yet \varphi(0) = \varphi(2\pi), therefore, Rolle’s theorem is not true in more than one dimension.
Let f \colon {\mathbb{R}}^2 \to {\mathbb{R}} be a function such that \frac{\partial f}{\partial x} and \frac{\partial f}{\partial y} exist at all points and there exists an M \in {\mathbb{R}} such that \left\lvert {\frac{\partial f}{\partial x}} \right\rvert \leq M and \left\lvert {\frac{\partial f}{\partial y}} \right\rvert \leq M at all points. Show that f is continuous.
Let f \colon {\mathbb{R}}^2 \to {\mathbb{R}} be a function and M \in R, such that for every (x,y) \in {\mathbb{R}}^2, the function g(t) := f(xt,yt) is differentiable and \left\lvert {g'(t)} \right\rvert \leq M.
a) Show that f is continuous at (0,0).
b) Find an example of such an f which is not continuous at every other point of {\mathbb{R}}^2 (Hint: Think back to how did we construct a nowhere continuous function on [0,1]).
Inverse and implicit function theorem
Note: 2–3 lectures
To prove the inverse function theorem we use the contraction mapping principle we have seen in and that we have used to prove Picard’s theorem. Recall that a mapping f \colon X \to X' between two metric spaces (X,d) and (X',d') is called a contraction if there exists a k < 1 such that d'\bigl(f(x),f(y)\bigr) \leq k d(x,y) \ \ \ \ \text{for all } x,y \in X. The contraction mapping principle says that if f \colon X \to X is a contraction and X is a complete metric space, then there exists a unique fixed point, that is, there exists a unique x \in X such that f(x) = x.
Intuitively if a function is differentiable, then it locally “behaves like” the derivative (which is a linear function). The idea of the inverse function theorem is that if a function is differentiable and the derivative is invertible, the function is (locally) invertible.
[thm:inverse] Let U \subset {\mathbb{R}}^n be a set and let f \colon U \to {\mathbb{R}}^n be a continuously differentiable function. Also suppose p \in U, f(p) = q, and f'(p) is invertible (that is, J_f(p) \not=0). Then there exist open sets V, W \subset {\mathbb{R}}^n such that p \in V \subset U, f(V) = W and f|_V is one-to-one and onto. Furthermore, the inverse g(y) = (f|_V)^{-1}(y) is continuously differentiable and g'(y) = {\bigl(f'(x)\bigr)}^{-1}, \qquad \text{ for all $x \in V$, $y = f(x)$.}
Write A = f'(p). As f' is continuous, there exists an open ball V around p such that \lVert {A-f'(x)} \rVert < \frac{1}{2\lVert {A^{-1}} \rVert} \qquad \text{for all $x \in V$.} Note that f'(x) is invertible for all x \in V.
Given y \in {\mathbb{R}}^n we define \varphi_y \colon C \to {\mathbb{R}}^n \varphi_y (x) = x + A^{-1}\bigl(y-f(x)\bigr) . As A^{-1} is one-to-one, then \varphi_y(x) = x (x is a fixed point) if only if y-f(x) = 0, or in other words f(x)=y. Using chain rule we obtain \varphi_y'(x) = I - A^{-1} f'(x) = A^{-1} \bigl( A-f'(x) \bigr) . So for x \in V we have \lVert {\varphi_y'(x)} \rVert \leq \lVert {A^{-1}} \rVert \lVert {A-f'(x)} \rVert < \nicefrac{1}{2} . As V is a ball it is convex, and hence \lVert {\varphi_y(x_1)-\varphi_y(x_2)} \rVert \leq \frac{1}{2} \lVert {x_1-x_2} \rVert \qquad \text{for all $x_1,x_2 \in V$}. In other words \varphi_y is a contraction defined on V, though we so far do not know what is the range of \varphi_y. We cannot apply the fixed point theorem, but we can say that \varphi_y has at most one fixed point (note proof of uniqueness in the contraction mapping principle). That is, there exists at most one x \in V such that f(x) = y, and so f|_V is one-to-one.
Let W = f(V). We need to show that W is open. Take a y_1 \in W, then there is a unique x_1 \in V such that f(x_1) = y_1. Let r > 0 be small enough such that the closed ball C(x_1,r) \subset V (such r > 0 exists as V is open).
Suppose y is such that \lVert {y-y_1} \rVert < \frac{r}{2\lVert {A^{-1}} \rVert} . If we show that y \in W, then we have shown that W is open. Define \varphi_y(x) = x+A^{-1}\bigl(y-f(x)\bigr) as before. If x \in C(x_1,r), then \begin{split} \lVert {\varphi_y(x)-x_1} \rVert & \leq \lVert {\varphi_y(x)-\varphi_y(x_1)} \rVert + \lVert {\varphi_y(x_1)-x_1} \rVert \\ & \leq \frac{1}{2}\lVert {x-x_1} \rVert + \lVert {A^{-1}(y-y_1)} \rVert \\ & \leq \frac{1}{2}r + \lVert {A^{-1}} \rVert\lVert {y-y_1} \rVert \\ & < \frac{1}{2}r + \lVert {A^{-1}} \rVert \frac{r}{2\lVert {A^{-1}} \rVert} = r . \end{split} So \varphi_y takes C(x_1,r) into B(x_1,r) \subset C(x_1,r). It is a contraction on C(x_1,r) and C(x_1,r) is complete (closed subset of {\mathbb{R}}^n is complete). Apply the contraction mapping principle to obtain a fixed point x, i.e. \varphi_y(x) = x. That is f(x) = y. So y \in f\bigl(C(x_1,r)\bigr) \subset f(V) = W. Therefore W is open.
Next we need to show that g is continuously differentiable and compute its derivative. First let us show that it is differentiable. Let y \in W and k \in {\mathbb{R}}^n, k\not= 0, such that y+k \in W. Then there are unique x \in V and h \in {\mathbb{R}}^n, h \not= 0 and x+h \in V, such that f(x) = y and f(x+h) = y+k as f|_V is a one-to-one and onto mapping of V onto W. In other words, g(y) = x and g(y+k) = x+h. We can still squeeze some information from the fact that \varphi_y is a contraction. \varphi_y(x+h)-\varphi_y(x) = h + A^{-1} \bigl( f(x)-f(x+h) \bigr) = h - A^{-1} k . So \lVert {h-A^{-1}k} \rVert = \lVert {\varphi_y(x+h)-\varphi_y(x)} \rVert \leq \frac{1}{2}\lVert {x+h-x} \rVert = \frac{\lVert {h} \rVert}{2}. By the inverse triangle inequality \lVert {h} \rVert - \lVert {A^{-1}k} \rVert \leq \frac{1}{2}\lVert {h} \rVert so \lVert {h} \rVert \leq 2 \lVert {A^{-1}k} \rVert \leq 2 \lVert {A^{-1}} \rVert \lVert {k} \rVert. In particular, as k goes to 0, so does h.
As x \in V, then f'(x) is invertible. Let B = \bigl(f'(x)\bigr)^{-1}, which is what we think the derivative of g at y is. Then \begin{split} \frac{\lVert {g(y+k)-g(y)-Bk} \rVert}{\lVert {k} \rVert} & = \frac{\lVert {h-Bk} \rVert}{\lVert {k} \rVert} \\ & = \frac{\lVert {h-B\bigl(f(x+h)-f(x)\bigr)} \rVert}{\lVert {k} \rVert} \\ & = \frac{\lVert {B\bigl(f(x+h)-f(x)-f'(x)h\bigr)} \rVert}{\lVert {k} \rVert} \\ & \leq \lVert {B} \rVert \frac{\lVert {h} \rVert}{\lVert {k} \rVert}\, \frac{\lVert {f(x+h)-f(x)-f'(x)h} \rVert}{\lVert {h} \rVert} \\ & \leq 2\lVert {B} \rVert\lVert {A^{-1}} \rVert \frac{\lVert {f(x+h)-f(x)-f'(x)h} \rVert}{\lVert {h} \rVert} . \end{split} As k goes to 0, so does h. So the right hand side goes to 0 as f is differentiable, and hence the left hand side also goes to 0. And B is precisely what we wanted g'(y) to be.
We have g is differentiable, let us show it is C^1(W). Now, g \colon W \to V is continuous (it is differentiable), f' is a continuous function from V to L({\mathbb{R}}^n), and X \to X^{-1} is a continuous function. g'(y) = {\bigl( f'\bigl(g(y)\bigr)\bigr)}^{-1} is the composition of these three continuous functions and hence is continuous.
Suppose U \subset {\mathbb{R}}^n is open and f \colon U \to {\mathbb{R}}^n is a continuously differentiable mapping such that f'(x) is invertible for all x \in U. Then given any open set V \subset U, f(V) is open. (f is an open mapping).
Without loss of generality, suppose U=V. For each point y \in f(V), we pick x \in f^{-1}(y) (there could be more than one such point), then by the inverse function theorem there is a neighborhood of x in V that maps onto an neighborhood of y. Hence f(V) is open.
The theorem, and the corollary, is not true if f'(x) is not invertible for some x. For example, the map f(x,y) = (x,xy), maps {\mathbb{R}}^2 onto the set {\mathbb{R}}^2 \setminus \{ (0,y) : y \neq 0 \}, which is neither open nor closed. In fact f^{-1}(0,0) = \{ (0,y) : y \in {\mathbb{R}}\}. This bad behavior only occurs on the y-axis, everywhere else the function is locally invertible. If we avoid the y-axis, f is even one-to-one.
Also note that just because f'(x) is invertible everywhere does not mean that f is one-to-one globally. It is “locally” one-to-one but perhaps not “globally.” For an example, take the map f \colon {\mathbb{R}}^2 \setminus \{ 0 \} \to {\mathbb{R}}^2 defined by f(x,y) = (x^2-y^2,2xy). It is left to student to show that f is differentiable and the derivative is invertible
On the other hand, the mapping is 2-to-1 globally. For every (a,b) that is not the origin, there are exactly two solutions to x^2-y^2=a and 2xy=b. We leave it to the student to show that there is at least one solution, and then notice that replacing x and y with -x and -y we obtain another solution.
The invertibility of the derivative is not a necessary condition, just sufficient, for having a continuous inverse and being an open mapping. For example the function f(x) = x^3 is an open mapping from {\mathbb{R}} to {\mathbb{R}} and is globally one-to-one with a continuous inverse, although the inverse is not differentiable at x=0.
Implicit function theorem
The inverse function theorem is really a special case of the implicit function theorem which we prove next. Although somewhat ironically we prove the implicit function theorem using the inverse function theorem. What we were showing in the inverse function theorem was that the equation x-f(y) = 0 was solvable for y in terms of x if the derivative in terms of y was invertible, that is if f'(y) was invertible. That is there was locally a function g such that x-f\bigl(g(x)\bigr) = 0.
OK, so how about we look at the equation f(x,y) = 0. Obviously this is not solvable for y in terms of x in every case. For example, when f(x,y) does not actually depend on y. For a slightly more complicated example, notice that x^2+y^2-1 = 0 defines the unit circle, and we can locally solve for y in terms of x when 1) we are near a point which lies on the unit circle and 2) when we are not at a point where the circle has a vertical tangency, or in other words where \frac{\partial f}{\partial y} = 0.
To make things simple we fix some notation. We let (x,y) \in {\mathbb{R}}^{n+m} denote the coordinates (x_1,\ldots,x_n,y_1,\ldots,y_m). A linear transformation A \in L({\mathbb{R}}^{n+m},{\mathbb{R}}^m) can then be written as A = [ A_x ~ A_y ] so that A(x,y) = A_x x + A_y y, where A_x \in L({\mathbb{R}}^n,{\mathbb{R}}^m) and A_y \in L({\mathbb{R}}^m).
Let A = [A_x~A_y] \in L({\mathbb{R}}^{n+m},{\mathbb{R}}^m) and suppose A_y is invertible. If B = - {(A_y)}^{-1} A_x, then 0 = A ( x, Bx) = A_x x + A_y Bx .
The proof is obvious. We simply solve and obtain y = Bx. Let us show that the same can be done for C^1 functions.
[thm:implicit] Let U \subset {\mathbb{R}}^{n+m} be an open set and let f \colon U \to {\mathbb{R}}^m be a C^1(U) mapping. Let (p,q) \in U be a point such that f(p,q) = 0 and such that \frac{\partial(f_1,\ldots,f_m)}{\partial(y_1,\ldots,y_m)} (p,q) \neq 0 . Then there exists an open set W \subset {\mathbb{R}}^n with p \in W, an open set W' \subset {\mathbb{R}}^m with q \in W', with W \times W' \subset U, and a C^1(W) mapping g \colon W \to W', with g(p) = q, and for all x \in W, the point g(x) is the unique point in W' such that f\bigl(x,g(x)\bigr) = 0 . Furthermore, if [ A_x ~ A_y ] = f'(p,q), then g'(p) = -{(A_y)}^{-1}A_x .
The condition \frac{\partial(f_1,\ldots,f_m)}{\partial(y_1,\ldots,y_m)} (p,q) = \det(A_y) \neq 0 simply means that A_y is invertible.
Define F \colon U \to {\mathbb{R}}^{n+m} by F(x,y) := \bigl(x,f(x,y)\bigr). It is clear that F is C^1, and we want to show that the derivative at (p,q) is invertible.
Let us compute the derivative. We know that \frac{\lVert {f(p+h,q+k) - f(p,q) - A_x h - A_y k} \rVert}{\lVert {(h,k)} \rVert} goes to zero as \lVert {(h,k)} \rVert = \sqrt{\lVert {h} \rVert^2+\lVert {k} \rVert^2} goes to zero. But then so does \frac{\lVert {\bigl(h,f(p+h,q+k)-f(p,q)\bigr) - (h,A_x h+A_y k)} \rVert}{\lVert {(h,k)} \rVert} = \frac{\lVert {f(p+h,q+k) - f(p,q) - A_x h - A_y k} \rVert}{\lVert {(h,k)} \rVert} . So the derivative of F at (p,q) takes (h,k) to (h,A_x h+A_y k). If (h,A_x h+A_y k) = (0,0), then h=0, and so A_y k = 0. As A_y is one-to-one, then k=0. Therefore F'(p,q) is one-to-one or in other words invertible and we apply the inverse function theorem.
That is, there exists some open set V \subset {\mathbb{R}}^{n+m} with (p,0) \in V, and an inverse mapping G \colon V \to {\mathbb{R}}^{n+m}, that is F\bigl(G(x,s)\bigr) = (x,s) for all (x,s) \in V (where x \in {\mathbb{R}}^n and s \in {\mathbb{R}}^m). Write G = (G_1,G_2) (the first n and the second m components of G). Then F\bigl(G_1(x,s),G_2(x,s)\bigr) = \bigl(G_1(x,s),f(G_1(x,s),G_2(x,s))\bigr) = (x,s) . So x = G_1(x,s) and f\bigl(G_1(x,s),G_2(x,s)\bigr) = f\bigl(x,G_2(x,s)\bigr) = s. Plugging in s=0 we obtain f\bigl(x,G_2(x,0)\bigr) = 0 . The set G(V) contains a whole neighborhood of the point (p,q) and therefore there are some open The set V is open and hence there exist some open sets \widetilde{W} and W' such that \widetilde{W} \times W' \subset G(V) with p \in \widetilde{W} and q \in W'. Then take W = \{ x \in \widetilde{W} : G_2(x,0) \in W' \}. The function that takes x to G_2(x,0) is continuous and therefore W is open. We define g \colon W \to {\mathbb{R}}^m by g(x) := G_2(x,0) which is the g in the theorem. The fact that g(x) is the unique point in W' follows because W \times W' \subset G(V) and G is one-to-one and onto G(V).
Next differentiate x\mapsto f\bigl(x,g(x)\bigr) , at p, which should be the zero map. The derivative is done in the same way as above. We get that for all h \in {\mathbb{R}}^{n} 0 = A\bigl(h,g'(p)h\bigr) = A_xh + A_yg'(p)h , and we obtain the desired derivative for g as well.
In other words, in the context of the theorem we have m equations in n+m unknowns. \begin{aligned} & f_1 (x_1,\ldots,x_n,y_1,\ldots,y_m) = 0 \\ & \qquad \qquad \qquad \vdots \\ & f_m (x_1,\ldots,x_n,y_1,\ldots,y_m) = 0\end{aligned} And the condition guaranteeing a solution is that this is a C^1 mapping (that all the components are C^1, or in other words all the partial derivatives exist and are continuous), and the matrix \begin{bmatrix} \frac{\partial f_1}{\partial y_1} & \ldots & \frac{\partial f_1}{\partial y_m} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial y_1} & \ldots & \frac{\partial f_m}{\partial y_m} \end{bmatrix} is invertible at (p,q).
Consider the set x^2+y^2-{(z+1)}^3 = -1, e^x+e^y+e^z = 3 near the point (0,0,0). The function we are looking at is f(x,y,z) = (x^2+y^2-{(z+1)}^3+1,e^x+e^y+e^z-3) . We find that f' = \begin{bmatrix} 2x & 2y & -3{(z+1)}^2 \\ e^x & e^y & e^z \end{bmatrix} . The matrix \begin{bmatrix} 2(0) & -3{(0+1)}^2 \\ e^0 & e^0 \end{bmatrix} = \begin{bmatrix} 0 & -3 \\ 1 & 1 \end{bmatrix} is invertible. Hence near (0,0,0) we can find y and z as C^1 functions of x such that for x near 0 we have x^2+y(x)^2-{\bigl(z(x)+1\bigr)}^3 = -1, \qquad e^x+e^{y(x)}+e^{z(x)} = 3 . The theorem does not tell us how to find y(x) and z(x) explicitly, it just tells us they exist. In other words, near the origin the set of solutions is a smooth curve in {\mathbb{R}}^3 that goes through the origin.
We remark that there are versions of the theorem for arbitrarily many derivatives. If f has k continuous derivatives, then the solution also has k continuous derivatives.
Exercises
Let C = \{ (x,y) \in {\mathbb{R}}^2 : x^2+y^2 = 1 \}.
a) Solve for y in terms of x near (0,1).
b) Solve for y in terms of x near (0,-1).
c) Solve for x in terms of y near (-1,0).
Define f \colon {\mathbb{R}}^2 \to {\mathbb{R}}^2 by f(x,y) :=
\bigl(x,y+h(x)\bigr) for some continuously differentiable function h of one variable.
a) Show that f is one-to-one and onto.
b) Compute f'.
c) Show that f' is invertible at all points, and compute its inverse.
Define f \colon {\mathbb{R}}^2 \to {\mathbb{R}}^2 \setminus \{ (0,0) \} by f(x,y) :=
\bigl(e^x\cos(y),e^x\sin(y)\bigr).
a) Show that f is onto.
b) Show that f' is invertible at all points.
c) Show that f is not one-to-one, in fact for every (a,b) \in {\mathbb{R}}^2
\setminus \{ (0,0) \}, there exist infinitely many different points (x,y) \in {\mathbb{R}}^2 such that f(x,y) = (a,b).
Therefore, invertible derivative at every point does not mean that f is invertible globally.
Find a map f \colon {\mathbb{R}}^n \to {\mathbb{R}}^n that is one-to-one, onto, continuously differentiable, but f'(0) = 0. Hint: Generalize f(x) = x^3 from one to n dimensions.
Consider z^2 + xz + y =0 in {\mathbb{R}}^3. Find an equation D(x,y)=0, such that if D(x_0,y_0) \not= 0 and z^2+x_0z+y_0 = 0 for some z \in {\mathbb{R}}, then for points near (x_0,y_0) there exist exactly two distinct continuously differentiable functions r_1(x,y) and r_2(x,y) such that z=r_1(x,y) and z=r_2(x,y) solve z^2 + xz + y =0. Do you recognize the expression D from algebra?
Suppose f \colon (a,b) \to {\mathbb{R}}^2 is continuously differentiable and \frac{\partial f}{\partial x}(t) \not= 0 for all t \in (a,b). Prove that there exists an interval (c,d) and a continuously differentiable function g \colon (c,d) \to {\mathbb{R}} such that (x,y) \in f\bigl((a,b)\bigr) if and only if x \in (c,d) and y=g(x). In other words, the set f\bigl((a,b)\bigr) is a graph of g.
Define f \colon {\mathbb{R}}^2 \to {\mathbb{R}}^2 f(x,y) :=
\begin{cases}
(x^2 \sin \bigl(\nicefrac{1}{x}\bigr) + \frac{x}{2} , y ) & \text{if $x \not= 0$,} \\
(0,y) & \text{if $x=0$.}
\end{cases} a) Show that f is differentiable everywhere.
b) Show that f'(0,0) is invertible.
c) Show that f is not one-to-one in any neighborhood of the origin (it is not locally invertible, that is, the inverse theorem does not work).
d) Show that f is not continuously differentiable.
[mv:exercise:polarcoordinates] Define a mapping F(r,\theta) := \bigl(r \cos(\theta), r \sin(\theta) \bigr).
a) Show that F is continuously differentiable (for all (r,\theta) \in
{\mathbb{R}}^2).
b) Compute F'(0,\theta) for any \theta.
c) Show that if r \not= 0, then F'(r,\theta) is invertible, therefore an inverse of F exists locally as long as r \not= 0.
d) Show that F \colon {\mathbb{R}}^2 \to {\mathbb{R}}^2 is onto, and for each point (x,y) \in
{\mathbb{R}}^2, the set F^{-1}(x,y) is infinite.
e) Show that F \colon {\mathbb{R}}^2 \to {\mathbb{R}}^2 is an open map, despite not satisfying the condition of the inverse function theorem.
f) Show that F|_{(0,\infty) \times [0,2\pi)} is one to one and onto {\mathbb{R}}^2 \setminus \{ (0,0) \}.
Higher order derivatives
Note: less than 1 lecture, depends on the optional §4.3 of volume I
Let U \subset {\mathbb{R}}^n be an open set and f \colon U \to {\mathbb{R}} a function. Denote by x = (x_1,x_2,\ldots,x_n) \in {\mathbb{R}}^n our coordinates. Suppose \frac{\partial f}{\partial x_j} exists everywhere in U, then we note that it is also a function \frac{\partial f}{\partial x_j} \colon U \to {\mathbb{R}}. Therefore it makes sense to talk about its partial derivatives. We denote the partial derivative of \frac{\partial f}{\partial x_j} with respect to x_k by \frac{\partial^2 f}{\partial x_k \partial x_j} := \frac{\partial \bigl( \frac{\partial f}{\partial x_j} \bigr)}{\partial x_k} . If k=j, then we write \frac{\partial^2 f}{\partial x_j^2} for simplicity.
We define higher order derivatives inductively. Suppose j_1,j_2,\ldots,j_\ell are integers between 1 and n, and suppose \frac{\partial^{\ell-1} f}{\partial x_{j_{\ell-1}} \partial x_{j_{\ell-2}} \cdots \partial x_{j_1}} exists and is differentiable in the variable x_{j_{\ell}}, then the partial derivative with respect to that variable is denoted by \frac{\partial^{\ell} f}{\partial x_{j_{\ell}} \partial x_{j_{\ell-1}} \cdots \partial x_{j_1}} := \frac{\partial \bigl( \frac{\partial^{\ell-1} f}{\partial x_{j_{\ell-1}} \partial x_{j_{\ell-2}} \cdots \partial x_{j_1}} \bigr)}{\partial x_{j_{\ell}}} . Such a derivative is called a partial derivative of order \ell.
Remark that sometimes the notation f_{x_j x_k} is used for \frac{\partial^2 f}{\partial x_k \partial x_j}. This notation swaps the order of derivatives, which may be important.
If U \subset {\mathbb{R}}^n is an open set and f \colon U \to {\mathbb{R}} a function. We say f is k-times continuously differentiable function, or a C^k function, if all partial derivatives of all orders up to and including order k exist and are continuous.
So a continuously differentiable, or C^1, function is one where all partial derivatives exist and are continuous, which agrees with our previous definition due to . We could have required only that the kth order partial derivatives exist and are continuous, as the existence of lower order derivatives is clearly necessary to even define kth order partial derivatives, and these lower order derivatives will be continuous as they will be differentiable functions.
When the partial derivatives are continuous, we can swap their order.
[mv:prop:swapders] Suppose U \subset {\mathbb{R}}^n is open and f \colon U \to {\mathbb{R}} is a C^2 function, and j and k are two integers between 1 and n. Then \frac{\partial^2 f}{\partial x_k \partial x_j} = \frac{\partial^2 f}{\partial x_j \partial x_k} .
Fix a point p \in U, and let e_j and e_k be the standard basis vectors and let s and t be two small nonzero real numbers. We pick s and t small enough so that p+s_0e_j +t_0e_k \in U for all s_0 and t_0 with \left\lvert {s_0} \right\rvert \leq \left\lvert {s} \right\rvert and \left\lvert {t_0} \right\rvert \leq \left\lvert {t} \right\rvert. This is possible since U is open and so contains a small ball (or a box if you wish).
Using the mean value theorem on the partial derivative in x_k of the function f(p+se_j)-f(p), we find a t_0 between 0 and t such that \frac{f(p+se_j + te_k)- f(p+t e_k) - f(p+s e_j)+f(p)}{t} = \frac{\partial f}{\partial x_k}(p + s e_j + t_0 e_k) - \frac{\partial f}{\partial x_k}(p + t_0 e_k) . Next there exists a number s_0 between 0 and s such that \frac{\frac{\partial f}{\partial x_k}(p + s e_j + t_0 e_k) - \frac{\partial f}{\partial x_k}(p + t_0 e_k)}{s} = \frac{\partial^2 f}{\partial x_j \partial x_k}(p + s_0 e_j + t_0 e_k) . In other words g(s,t) := \frac{f(p+se_j + te_k)- f(p+t e_k) - f(p+s e_j)+f(p)}{st} = \frac{\partial^2 f}{\partial x_j \partial x_k}(p + s_0 e_j + t_0 e_k) . Taking a limit as (s,t) \in {\mathbb{R}}^2 goes to zero we find that (s_0,t_0) also goes to zero and by continuity of the second partial derivatives we find that \lim_{(s,t) \to 0} g(s,t) = \frac{\partial^2 f}{\partial x_j \partial x_k}(p) . We now reverse the ordering, starting with the function f(p+te_k)-f(p) we find an s_1 between 0 and s such that \frac{f(p+te_k + se_j)- f(p+s e_j) - f(p+t e_k)+f(p)}{s} = \frac{\partial f}{\partial x_j}(p + t e_k + s_1 e_j) - \frac{\partial f}{\partial x_j}(p + s_1 e_j) . And we find a t_1 between 0 and t \frac{\frac{\partial f}{\partial x_j}(p + t e_k + s_1 e_j) - \frac{\partial f}{\partial x_j}(p + s_1 e_j)}{t} = \frac{\partial^2 f}{\partial x_k \partial x_j}(p + t_1 e_k + s_1 e_j) . Again we find that g(s,t) = \frac{\partial^2 f}{\partial x_k \partial x_j}(p + t_1 e_k + s_1 e_j) and therefore \lim_{(s,t) \to 0} g(s,t) = \frac{\partial^2 f}{\partial x_k \partial x_j}(p) . And therefore the two partial derivatives are equal.
The proposition does not hold if the derivatives are not continuous. See the exercises. Notice also that we did not really need a C^2 function we only needed the two second order partial derivatives involved to be continuous functions.
Exercises
Suppose f \colon U \to {\mathbb{R}} is a C^2 function for some open U \subset {\mathbb{R}}^n and p \in U. Use the proof of to find an expression in terms of just the values of f (analogue of the difference quotient for the first derivative), whose limit is \frac{\partial^2 f}{ \partial x_j \partial x_k}(p).
Define f(x,y) :=
\begin{cases}
\frac{xy(x^2-y^2)}{x^2+y^2} & \text{ if $(x,y) \not= (0,0)$,}\\
0 & \text{ if $(x,y) = (0,0)$.}
\end{cases} Show that
a) The first order partial derivatives exist and are continuous.
b) The partial derivatives \frac{\partial^2 f}{\partial x \partial y} and \frac{\partial^2 f}{\partial y \partial x} exist, but are not continuous at the origin, and \frac{\partial^2 f}{\partial x \partial y}(0,0) \not=
\frac{\partial^2 f}{\partial y \partial x}(0,0).
Suppose f \colon U \to {\mathbb{R}} is a C^k function for some open U \subset {\mathbb{R}}^n and p \in U. Suppose j_1,j_2,\ldots,j_k are integers between 1 and n, and suppose \sigma=(\sigma_1,\sigma_2,\ldots,\sigma_k) is a permutation of (1,2,\ldots,k). Prove \frac{\partial^{k} f}{\partial x_{j_{k}} \partial x_{j_{k-1}} \cdots \partial x_{j_1}} (p) = \frac{\partial^{k} f}{\partial x_{j_{\sigma_k}} \partial x_{j_{\sigma_{k-1}}} \cdots \partial x_{j_{\sigma_1}}} (p) .
Suppose \varphi \colon {\mathbb{R}}^2 \to {\mathbb{R}} be a C^k function such that \varphi(0,\theta) = \varphi(0,\psi) for all \theta,\psi \in {\mathbb{R}} and \varphi(r,\theta) = \varphi(r,\theta+2\pi) for all r,\theta \in {\mathbb{R}}. Let F(r,\theta) = \bigl(r \cos(\theta), r \sin(\theta) \bigr) from . Show that a function g \colon {\mathbb{R}}^2 \to {\mathbb{R}}, given g(x,y) := \varphi \bigl(F^{-1}(x,y)\bigr) is well defined (notice that F^{-1}(x,y) can only be defined locally), and when restricted to {\mathbb{R}}^2 \setminus \{ 0 \} it is a C^k function.
One dimensional integrals in several variables
Differentiation under the integral
Note: less than 1 lecture
Let f(x,y) be a function of two variables and define g(y) := \int_a^b f(x,y) ~dx . Suppose f is differentiable in y. The question we ask is when can we “differentiate under the integral”, that is, when is it true that g is differentiable and its derivative g'(y) \overset{?}{=} \int_a^b \frac{\partial f}{\partial y}(x,y) ~dx . Differentiation is a limit and therefore we are really asking when do the two limiting operations of integration and differentiation commute. As we have seen, this is not always possible, some sort of uniformity is necessary. In particular, the first question we would face is the integrability of \frac{\partial f}{\partial y}, but the formula can fail even if \frac{\partial f}{\partial y} is integrable for all y.
Let us prove a simple, but the most useful version of this theorem.
Suppose f \colon [a,b] \times [c,d] \to {\mathbb{R}} is a continuous function, such that \frac{\partial f}{\partial y} exists for all (x,y) \in [a,b] \times [c,d] and is continuous. Define g(y) := \int_a^b f(x,y) ~dx . Then g \colon [c,d] \to {\mathbb{R}} is differentiable and g'(y) = \int_a^b \frac{\partial f}{\partial y}(x,y) ~dx .
The continuity requirements for f and \frac{\partial f}{\partial y} can be weakened, but not dropped outright. The main point is for \frac{\partial f}{\partial y} to exist and be continuous for a small interval in the y direction. In applications, the [c,d] can be a small interval around the point where you need to differentiate.
Fix y \in [c,d] and let \epsilon > 0 be given. As \frac{\partial f}{\partial y} is continuous on [a,b] \times [c,d] it is uniformly continuous. In particular, there exists \delta > 0 such that whenever y_1 \in [c,d] with \left\lvert {y_1-y} \right\rvert < \delta and all x \in [a,b] we have \left\lvert {\frac{\partial f}{\partial y}(x,y_1)-\frac{\partial f}{\partial y}(x,y)} \right\rvert < \epsilon .
Suppose h is such that y+h \in [c,d] and \left\lvert {h} \right\rvert < \delta. Fix x for a moment and apply mean value theorem to find a y_1 between y and y+h such that \frac{f(x,y+h)-f(x,y)}{h} = \frac{\partial f}{\partial y}(x,y_1) . If \left\lvert {h} \right\rvert < \delta, then \left\lvert { \frac{f(x,y+h)-f(x,y)}{h} - \frac{\partial f}{\partial y}(x,y) } \right\rvert = \left\lvert { \frac{\partial f}{\partial y}(x,y_1) - \frac{\partial f}{\partial y}(x,y) } \right\rvert < \epsilon . This argument worked for every x \in [a,b]. Therefore, as a function of x x \mapsto \frac{f(x,y+h)-f(x,y)}{h} \qquad \text{converges uniformly to} \qquad x \mapsto \frac{\partial f}{\partial y}(x,y) \qquad \text{as $h \to 0$} . We only defined uniform convergence for sequences although the idea is the same. If you wish you can replace h with \nicefrac{1}{n} above and let n \to \infty.
Now consider the difference quotient \frac{g(y+h)-g(y)}{h} = \frac{\int_a^b f(x,y+h) ~dx - \int_a^b f(x,y) ~dx }{h} = \int_a^b \frac{f(x,y+h)-f(x,y)}{h} ~dx . Uniform convergence can be taken underneath the integral and therefore \lim_{h\to 0} \frac{g(y+h)-g(y)}{h} = \int_a^b \lim_{h\to 0} \frac{f(x,y+h)-f(x,y)}{h} ~dx = \int_a^b \frac{\partial f}{\partial y}(x,y) ~dx . \qedhere
Let f(y) = \int_0^1 \sin(x^2-y^2) ~dx . Then f'(y) = \int_0^1 -2y\cos(x^2-y^2) ~dx .
Suppose we start with \int_0^{1} \frac{x-1}{\ln(x)} ~dx . The function under the integral extends to be continuous on [0,1], and hence the integral exists, see exercise below. Trouble is finding it. Introduce a parameter y and define a function: g(y) := \int_0^{1} \frac{x^y-1}{\ln(x)} ~dx . The function \frac{x^y-1}{\ln(x)} also extends to a continuous function of x and y for (x,y) \in [0,1] \times [0,1]. Therefore g is a continuous function of on [0,1]. In particular, g(0) = 0. For any \epsilon > 0, the y derivative of the integrand, x^y, is continuous on [0,1] \times [\epsilon,1]. Therefore, for y >0 we may differentiate under the integral sign g'(y) = \int_0^{1} \frac{\ln(x) x^y}{\ln(x)} ~dx = \int_0^{1} x^y ~dx = \frac{1}{y+1} . We need to figure out g(1), knowing g'(y) = \frac{1}{y+1} and g(0) = 0. By elementary calculus we find g(1) = \int_0^1 g'(y)~dy = \ln(2). Therefore \int_0^{1} \frac{x-1}{\ln(x)} ~dx = \ln(2).
Prove the two statements that were asserted in the example.
a) Prove \frac{x-1}{\ln(x)} extends to a continuous function of [0,1].
b) Prove \frac{x^y-1}{\ln(x)} extends to be a continuous function on [0,1] \times [0,1].
Exercises
Suppose h \colon {\mathbb{R}}\to {\mathbb{R}} is a continuous function. Suppose g \colon {\mathbb{R}}\to {\mathbb{R}} is which is continuously differentiable and compactly supported. That is there exists some M > 0 such that g(x) = 0 whenever \left\lvert {x} \right\rvert \geq M. Define f(x) := \int_{-\infty}^\infty h(y)g(x-y)~dy . Show that f is differentiable.
Suppose f \colon {\mathbb{R}}\to {\mathbb{R}} is an infinitely differentiable function (all derivatives exist) such that f(0) = 0. Then show that there exists another infinitely differentiable function g(x) such that f(x) = xg(x). Finally show that if f'(0) \not= 0, then g(0) \not= 0. Hint: first write f(x) = \int_0^x f'(s) ds and then rewrite the integral to go from 0 to 1.
Compute \int_0^1 e^{tx} ~dx. Derive the formula for \int_0^1 x^n e^{x} ~dx not using integration by parts, but by differentiation underneath the integral.
Let U \subset {\mathbb{R}}^n be an open set and suppose f(x,y_1,y_2,\ldots,y_n) is a continuous function defined on [0,1] \times U \subset {\mathbb{R}}^{n+1}. Suppose \frac{\partial f}{\partial y_1}, \frac{\partial f}{\partial y_2},\ldots, \frac{\partial f}{\partial y_n} exist and are continuous on [0,1] \times U. Then prove that F \colon U \to {\mathbb{R}} defined by F(y_1,y_2,\ldots,y_n) := \int_0^1 f(x,y_1,y_2,\ldots,y_n) \, dx is continuously differentiable.
Work out the following counterexample: Let f(x,y) :=
\begin{cases}
\frac{xy^3}{{(x^2+y^2)}^2} & \text{if $x\not=0$ or $y\not= 0$,}\\
0 & \text{if $x=0$ and $y=0$.}
\end{cases} a) Prove that for any fixed y the function x \mapsto f(x,y) is Riemann integrable on [0,1] and g(y) = \int_0^1 f(x,y) \, dx = \frac{y}{2y^2+2} . Therefore g'(y) exists and we get the continuous function g'(y) = \frac{1-y^2}{2{(y^2+1)}^2} . b) Prove \frac{\partial f}{\partial y} exists at all x and y and compute it.
c) Show that for all y \int_0^1 \frac{\partial f}{\partial y} (x,y) \, dx exists but g'(0) \not= \int_0^1 \frac{\partial f}{\partial y} (x,0) \, dx .
Work out the following counterexample: Let f(x,y) :=
\begin{cases}
xy^2 \sin\bigl(\frac{1}{x^3y}\bigr) & \text{if $x\not=0$ and $y\not= 0$,}\\
0 & \text{if $x=0$ or $y=0$.}
\end{cases} a) Prove f is continuous on [0,1] \times [a,b] for any interval [a,b]. Therefore the following function is well defined on [a,b] g(y) = \int_0^1 f(x,y) \, dx . b) Prove \frac{\partial f}{\partial y} exists for all (x,y) in [0,1] \times [a,b], but is not continuous.
c) Show that \int_0^1 \frac{\partial f}{\partial y}(x,y) \, dx does not exist if y \not= 0 even if we take improper integrals.
Path integrals
Note: 2–3 lectures
Piecewise smooth paths
A continuously differentiable function \gamma \colon [a,b] \to {\mathbb{R}}^n is called a smooth path or a continuously differentiable path4 if \gamma is continuously differentiable and \gamma^{\:\prime}(t) \not= 0 for all t \in [a,b].
The function \gamma is called a piecewise smooth path or a piecewise continuously differentiable path if there exist finitely many points t_0 = a < t_1 < t_2 < \cdots < t_k = b such that the restriction of the function \gamma|_{[t_{j-1},t_j]} is smooth path.
We say \gamma is a simple path if \gamma|_{(a,b)} is a one-to-one function. A \gamma is a closed path if \gamma(a) = \gamma(b), that is if the path starts and ends in the same point.
Since \gamma is a function of one variable, we have seen before that treating \gamma^{\:\prime}(t) as a matrix is equivalent to treating it as a vector since it is an n \times 1 matrix, that is, a column vector. In fact, by an earlier exercise, even the operator norm of \gamma^{\:\prime}(t) is equal to the euclidean norm. Therefore, we will write \gamma^{\:\prime}(t) as a vector as is usual, and then \gamma^{\:\prime}(t) is just the vector of the derivatives of its components, so if \gamma(t) = \bigl( \gamma_1(t), \gamma_2(t), \ldots, \gamma_n(t) \bigr), then \gamma^{\:\prime}(t) = \bigl( \gamma_1^{\:\prime}(t), \gamma_2^{\:\prime}(t), \ldots, \gamma_n^{\:\prime}(t) \bigr).
One can often get by with only smooth paths, but for computations, the simplest paths to write down are often piecewise smooth. Note that a piecewise smooth function (or path) is automatically continuous (exercise).
Generally, it is the direct image \gamma\bigl([a,b]\bigr) that is what we are interested in, although how we parametrize it with \gamma is also important to some degree. We informally talk about a curve, and often we really mean the set \gamma\bigl([a,b]\bigr), just as before depending on context.
[mv:example:unitsquarepath] Let \gamma \colon [0,4] \to {\mathbb{R}}^2 be defined by \gamma(t) := \begin{cases} (t,0) & \text{if $t \in [0,1]$,}\\ (1,t-1) & \text{if $t \in (1,2]$,}\\ (3-t,1) & \text{if $t \in (2,3]$,}\\ (0,4-t) & \text{if $t \in (3,4]$.} \end{cases} Then the reader can check that the path is the unit square traversed counterclockwise. We can check that for example \gamma|_{[1,2]}(t) = (1,t-1) and therefore (\gamma|_{[1,2]})'(t) = (0,1) \not= 0. It is good to notice at this point that (\gamma|_{[1,2]})'(1) = (0,1), (\gamma|_{[0,1]})'(1) = (1,0), and \gamma^{\:\prime}(1) does not exist. That is, at the corners \gamma is of course not differentiable, even though the restrictions are differentiable and the derivative depends on which restriction you take.
The condition that \gamma^{\:\prime}(t) \not= 0 means that the image of \gamma has no “corners” where \gamma is continuously differentiable. For example, take the function \gamma(t) := \begin{cases} (t^2,0) & \text{ if $t < 0$,}\\ (0,t^2) & \text{ if $t \geq 0$.} \end{cases} It is left for the reader to check that \gamma is continuously differentiable, yet the image \gamma({\mathbb{R}}) = \{ (x,y) \in {\mathbb{R}}^2 : (x,y) = (s,0) \text{ or } (x,y) = (0,s) \text{ for some\)s 0\(} \} has a “corner” at the origin. And that is because \gamma^{\:\prime}(0) = (0,0). More complicated examples with even infinitely many corners exist, see the exercises.
The condition that \gamma^{\:\prime}(t) \not= 0 even at the endpoints guarantees not only no corners, but also that the path ends nicely, that is, can extend a little bit past the endpoints. Again, see the exercises.
A graph of a continuously differentiable function f \colon [a,b] \to {\mathbb{R}} is a smooth path. That is, define \gamma \colon [a,b] \to {\mathbb{R}}^2 by \gamma(t) := \bigl(t,f(t)\bigr) . Then \gamma^{\:\prime}(t) = \bigl( 1 , f'(t) \bigr), which is never zero.
There are other ways of parametrizing the path. That is, having a different path with the same image. For example, the function that takes t to (1-t)a+tb, takes the interval [0,1] to [a,b]. So let \alpha \colon [0,1] \to {\mathbb{R}}^2 be defined by \alpha(t) := \bigl((1-t)a+tb,f((1-t)a+tb)\bigr) . Then \alpha'(t) = \bigl( b-a , (b-a)f'((1-t)a+tb) \bigr), which is never zero. Furthermore as sets \alpha\bigl([0,1]\bigr) = \gamma\bigl([a,b]\bigr) = \{ (x,y) \in {\mathbb{R}}^2 : x \in [a,b] \text{ and } f(x) = y \}, which is just the graph of f.
The last example leads us to a definition.
Let \gamma \colon [a,b] \to {\mathbb{R}}^n be a smooth path and h \colon [c,d] \to [a,b] a continuously differentiable bijective function such that h'(t) \not= 0 for all t \in [c,d]. Then the composition \gamma \circ h is called a smooth reparametrization of \gamma.
Let \gamma be a piecewise smooth path, and h be a piecewise smooth bijective function. Then the composition \gamma \circ h is called a piecewise smooth reparametrization of \gamma.
If h is strictly increasing, then h is said to preserve orientation. If h does not preserve orientation, then h is said to reverse orientation.
A reparametrization is another path for the same set. That is, (\gamma \circ h)\bigl([c,d]\bigr) = \gamma \bigl([a,b]\bigr).
Let us remark that for h, piecewise smooth means that there is some partition t_0 = c < t_1 < t_2 < \cdots < t_k = d, such that h|_{[t_{j-1},t_j]} is continuously differentiable and (h|_{[t_{j-1},t_j]})'(t) \not= 0 for all t \in [t_{j-1},t_j]. Since h is bijective, it is either strictly increasing or strictly decreasing. Therefore either (h|_{[t_{j-1},t_j]})'(t) > 0 for all t or (h|_{[t_{j-1},t_j]})'(t) < 0 for all t.
[prop:reparamapiecewisesmooth] If \gamma \colon [a,b] \to {\mathbb{R}}^n is a piecewise smooth path, and \gamma \circ h \colon [c,d] \to {\mathbb{R}}^n is a piecewise smooth reparametrization, then \gamma \circ h is a piecewise smooth path.
Let us assume that h preserves orientation, that is, h is strictly increasing. If h \colon [c,d] \to [a,b] gives a piecewise smooth reparametrization, then for some partition r_0 = c < r_1 < r_2 < \cdots < r_\ell = d, we have h|_{[t_{j-1},t_j]} is continuously differentiable with positive derivative.
Let t_0 = a < t_1 < t_2 < \cdots < t_k = b be the partition from the definition of piecewise smooth for \gamma together with the points \{ h(r_0), h(r_1), h(r_2), \ldots, h(r_\ell) \}. Let s_j := h^{-1}(t_j). Then s_0 = c < s_1 < s_2 < \cdots < s_k = d. For t \in [s_{j-1},s_j] notice that h(t) \in [t_{j-1},t_j], h|_{[s_{j-1},s_j]} is continuously differentiable, and \varphi|_{[t_{j-1},t_j]} is also continuously differentiable. Then (\gamma \circ h)|_{[s_{j-1},s_{j}]} (t) = \gamma|_{[t_{j-1},t_{j}]} \bigl( h|_{[s_{j-1},s_j]}(t) \bigr) . The function (\gamma \circ h)|_{[s_{j-1},s_{j}]} is therefore continuously differentiable and by the chain rule \bigl( (\gamma \circ h)|_{[s_{j-1},s_{j}]} \bigr) ' (t) = \bigl( \gamma|_{[t_{j-1},t_{j}]} \bigr)' \bigl( h(t) \bigr) (h|_{[s_{j-1},s_j]})'(t) \not= 0 . Therefore \gamma \circ h is a piecewise smooth path. The case for an orientation reversing h is left as an exercise.
If two paths are simple and their images are the same, it is left as an exercise that there exists a reparametrization.
Path integral of a one-form
If (x_1,x_2,\ldots,x_n) \in {\mathbb{R}}^n are our coordinates, and given n real-valued continuous functions f_1,f_2,\ldots,f_n defined on some set S \subset {\mathbb{R}}^n we define a so-called one-form: \omega = \omega_1 dx_1 + \omega_2 dx_2 + \cdots \omega_n dx_n . We could represent \omega as a continuous function from S to {\mathbb{R}}^n, although it is better to think of it as a different object.
For example, \omega(x,y) = \frac{-y}{x^2+y^2} dx + \frac{x}{x^2+y^2} dy is a one-form defined on {\mathbb{R}}^2 \setminus \{ (0,0) \}.
Let \gamma \colon [a,b] \to {\mathbb{R}}^n be a smooth path and \omega = \omega_1 dx_1 + \omega_2 dx_2 + \cdots \omega_n dx_n , a one-form defined on the direct image \gamma\bigl([a,b]\bigr). Let \gamma = (\gamma_1,\gamma_2,\ldots,\gamma_n) be the components of \gamma. Define: \begin{split} \int_{\gamma} \omega & := \int_a^b \Bigl( \omega_1\bigl(\gamma(t)\bigr) \gamma_1^{\:\prime}(t) + \omega_2\bigl(\gamma(t)\bigr) \gamma_2^{\:\prime}(t) + \cdots + \omega_n\bigl(\gamma(t)\bigr) \gamma_n^{\:\prime}(t) \Bigr) \, dt \\ &\phantom{:}= \int_a^b \left( \sum_{j=1}^n \omega_j\bigl(\gamma(t)\bigr) \gamma_j^{\:\prime}(t) \right) \, dt . \end{split} If \gamma is piecewise smooth, take the corresponding partition t_0 = a < t_1 < t_2 < \ldots < t_k = b, where we assume the partition is the minimal one, that is \gamma is not differentiable at t_2,t_3,\ldots,t_{k-1}. Each \gamma|_{[t_{j-1},t_j]} is a smooth path and we define \int_{\gamma} \omega := \int_{\gamma|_{[t_0,t_1]}} \omega \, + \, \int_{\gamma|_{[t_1,t_2]}} \omega \, + \, \cdots \, + \, \int_{\gamma|_{[t_{n-1},t_n]}} \omega .
The notation makes sense from the formula you remember from calculus, let us state it somewhat informally: if x_j(t) = \gamma_j(t), then dx_j = \gamma_j^{\:\prime}(t) dt.
Paths can be cut up or concatenated as follows. The proof is a direct application of the additivity of the Riemann integral, and is left as an exercise. The proposition also justifies why we defined the integral over a piecewise smooth path in the way we did, and it further justifies that we may as well have taken any partition not just the minimal one in the definition.
[mv:prop:pathconcat] Let \gamma \colon [a,c] \to {\mathbb{R}}^n be a piecewise smooth path. For some b \in (a,c), define the piecewise smooth paths \alpha = \gamma|_{[a,b]} and \beta = \gamma|_{[b,c]}. For a one-form \omega defined on the image of \gamma we have \int_{\gamma} \omega = \int_{\alpha} \omega + \int_{\beta} \omega .
[example:mv:irrotoneformint] Let the one-form \omega and the path \gamma \colon [0,2\pi] \to {\mathbb{R}}^2 be defined by \omega(x,y) := \frac{-y}{x^2+y^2} dx + \frac{x}{x^2+y^2} dy, \qquad \gamma(t) := \bigl(\cos(t),\sin(t)\bigr) . Then \[\begin{split} \int_{\gamma} \omega & = \int_0^{2\pi} \Biggl( \frac{-\sin(t)}
Callstack: at (Bookshelves/Analysis/Introduction_to_Real_Analysis_(Lebl)/10:_One_dimensional_integrals_in_several_variables/10.04:_temp), /content/body/div[10]/div/div[2]/p[7]/span[2], line 1, column 1 at wiki.page() at (Under_Construction/Purgatory/Remixer_University/Username:_junalyn2020/Book:_Introduction_to_Real_Analysis_(Lebl)/9:_One_dimensional_integrals_in_several_variables/temp), /content/body/div[1]/pre, line 2, column 14
Callstack: at (Bookshelves/Analysis/Introduction_to_Real_Analysis_(Lebl)/10:_One_dimensional_integrals_in_several_variables/10.04:_temp), /content/body/div[10]/div/div[2]/p[7]/span[3], line 1, column 1 at wiki.page() at (Under_Construction/Purgatory/Remixer_University/Username:_junalyn2020/Book:_Introduction_to_Real_Analysis_(Lebl)/9:_One_dimensional_integrals_in_several_variables/temp), /content/body/div[1]/pre, line 2, column 14
The previous example is not a fluke. The path integral does not depend on the parametrization of the curve, the only thing that matters is the direction in which the curve is traversed.
[mv:prop:pathintrepararam] Let \gamma \colon [a,b] \to {\mathbb{R}}^n be a piecewise smooth path and \gamma \circ h \colon [c,d] \to {\mathbb{R}}^n a piecewise smooth reparametrization. Suppose \omega is a one-form defined on the set \gamma\bigl([a,b]\bigr). Then \int_{\gamma \circ h} \omega = \begin{cases} \int_{\gamma} \omega & \text{ if $h$ preserves orientation,}\\ -\int_{\gamma} \omega & \text{ if $h$ reverses orientation.} \end{cases}
Assume first that \gamma and h are both smooth. Write the one-form as \omega = \omega_1 dx_1 + \omega_2 dx_2 + \cdots + \omega_n dx_n. Suppose first that h is orientation preserving. Using the definition of the path integral and the change of variables formula for the Riemann integral, \begin{split} \int_{\gamma} \omega & = \int_a^b \left( \sum_{j=1}^n \omega_j\bigl(\gamma(t)\bigr) \gamma_j^{\:\prime}(t) \right) \, dt %\left( %\omega_1\bigl(\gamma(t)\bigr) \gamma_1^{\:\prime}(t) + %\omega_2\bigl(\gamma(t)\bigr) \gamma_2^{\:\prime}(t) + \cdots + %\omega_n\bigl(\gamma(t)\bigr) \gamma_n^{\:\prime}(t) \right) \, dt \\ & = \int_c^d \left( \sum_{j=1}^n \omega_j\Bigl(\gamma\bigl(h(\tau)\bigr)\Bigr) \gamma_j^{\:\prime}\bigl(h(\tau)\bigr) \right) h'(\tau) \, d\tau %\left( %\omega_1\bigl(\gamma(h(\tau))\bigr) \gamma_1^{\:\prime}(h(\tau)) + %\omega_2\bigl(\gamma(h(\tau))\bigr) \gamma_2^{\:\prime}(h(\tau)) + \cdots + %\omega_n\bigl(\gamma(h(\tau))\bigr) \gamma_n^{\:\prime}(h(\tau)) \right) h'(\tau) \, d\tau \\ & = \int_c^d \left( \sum_{j=1}^n \omega_j\Bigl(\gamma\bigl(h(\tau)\bigr)\Bigr) (\gamma_j \circ h)'(\tau) \right) \, d\tau %\left( %\omega_1\bigl(\gamma(h(\tau))\bigr) (\gamma_1 \circ h)'(\tau) + %\omega_2\bigl(\gamma(h(\tau))\bigr) (\gamma_2 \circ h)'(\tau) + \cdots + %\omega_n\bigl(\gamma(h(\tau))\bigr) (\gamma_n \circ h)'(\tau) \right) \, d\tau %\\ %& = = \int_{\gamma \circ h} \omega . \end{split} If h is orientation reversing it will swap the order of the limits on the integral introducing a minus sign. The details, along with finishing the proof for piecewise smooth paths is left to the reader as .
Due to this proposition (and the exercises), if we have a set \Gamma \subset {\mathbb{R}}^n that is the image of a simple piecewise smooth path \gamma\bigl([a,b]\bigr), then if we somehow indicate the orientation, that is, which direction we traverse the curve, in other words where we start and where we finish. Then we just write \int_{\Gamma} \omega , without mentioning the specific \gamma. Furthermore, for a simple closed path, it does not even matter where we start the parametrization. See the exercises.
Recall that simple means that \gamma restricted to (a,b) is one-to-one, that is, it is one-to-one except perhaps at the endpoints. We also often relax the simple path condition a little bit. For example, as long as \gamma \colon [a,b] \to {\mathbb{R}}^n is one-to-one except at finitely many points. That is, there are only finitely many points p \in {\mathbb{R}}^n such that \gamma^{-1}(p) is more than one point. See the exercises. The issue about the injectivity problem is illustrated by the following example.
Suppose \gamma \colon [0,2\pi] \to {\mathbb{R}}^2 is given by \gamma(t) := \bigl(\cos(t),\sin(t)\bigr) and \beta \colon [0,2\pi] \to {\mathbb{R}}^2 is given by \beta(t) := \bigl(\cos(2t),\sin(2t)\bigr). Notice that \gamma\bigl([0,2\pi]\bigr) = \beta\bigl([0,2\pi]\bigr), and we travel around the same curve, the unit circle. But \gamma goes around the unit circle once in the counter clockwise direction, and \beta goes around the unit circle twice (in the same direction). Then \begin{aligned} & \int_{\gamma} -y\, dx + x\,dy = \int_0^{2\pi} \Bigl( \bigl(-\sin(t) \bigr) \bigl(-\sin(t) \bigr) + \cos(t) \cos(t) \Bigr) dt = 2 \pi,\\ & \int_{\beta} -y\, dx + x\,dy = \int_0^{2\pi} \Bigl( \bigl(-\sin(2t) \bigr) \bigl(-2\sin(2t) \bigr) + \cos(t) \bigl(2\cos(t)\bigr) \Bigr) dt = 4 \pi.\end{aligned}
It is sometimes convenient to define a path integral over \gamma \colon [a,b] \to {\mathbb{R}}^n that is not a path. We define \int_{\gamma} \omega := \int_a^b \left( \sum_{j=1}^n \omega_j\bigl(\gamma(t)\bigr) \gamma_j^{\:\prime}(t) \right) \, dt for any \gamma which is continuously differentiable. A case which comes up naturally is when \gamma is constant. In this case \gamma^{\:\prime}(t) = 0 for all t and \gamma\bigl([a,b]\bigr) is a single point, which we regard as a “curve” of length zero. Then, \int_{\gamma} \omega = 0.
Line integral of a function
Sometimes we wish to simply integrate a function against the so-called arc-length measure.
Suppose \gamma \colon [a,b] \to {\mathbb{R}}^n is a smooth path, and f is a continuous function defined on the image \gamma\bigl([a,b]\bigr). Then define \int_{\gamma} f \,ds := \int_a^b f\bigl( \gamma(t) \bigr) \lVert {\gamma^{\:\prime}(t)} \rVert \, dt .
The definition for a piecewise smooth path is similar as before and is left to the reader.
The geometric idea of this integral is to find the “area under the graph of a function” as we move around the path \gamma. The line integral of a function is also independent of the parametrization, and in this case, the orientation does not matter.
[mv:prop:lineintrepararam] Let \gamma \colon [a,b] \to {\mathbb{R}}^n be a piecewise smooth path and \gamma \circ h \colon [c,d] \to {\mathbb{R}}^n a piecewise smooth reparametrization. Suppose f is a continuous function defined on the set \gamma\bigl([a,b]\bigr). Then \int_{\gamma \circ h} f\, ds = \int_{\gamma} f\, ds .
Suppose first that h is orientation preserving and \gamma and h are both smooth. Then as before \begin{split} \int_{\gamma} f \, ds & = \int_a^b f\bigl(\gamma(t)\bigr) \lVert {\gamma^{\:\prime}(t)} \rVert \, dt \\ & = \int_c^d f\Bigl(\gamma\bigl(h(\tau)\bigr)\Bigr) \lVert {\gamma^{\:\prime}\bigl(h(\tau)\bigr)} \rVert h'(\tau) \, d\tau \\ & = \int_c^d f\Bigl(\gamma\bigl(h(\tau)\bigr)\Bigr) \lVert {\gamma^{\:\prime}\bigl(h(\tau)\bigr) h'(\tau)} \rVert \, d\tau \\ & = \int_c^d f\bigl((\gamma \circ h)(\tau)\bigr) \lVert {(\gamma \circ h)'(\tau)} \rVert \, d\tau \\ & = \int_{\gamma \circ h} f \, ds . \end{split} If h is orientation reversing it will swap the order of the limits on the integral but you also have to introduce a minus sign in order to take h' inside the norm. The details, along with finishing the proof for piecewise smooth paths is left to the reader as .
Similarly as before, because of this proposition (and the exercises), if \gamma is simple, it does not matter which parametrization we use. Therefore, if \Gamma = \gamma\bigl( [a,b] \bigr) we can simply write \int_\Gamma f\, ds . In this case we also do not need to worry about orientation, either way we get the same thing.
Let f(x,y) = x. Let C \subset {\mathbb{R}}^2 be half of the unit circle for x \geq 0. We wish to compute \int_C f \, ds . Parametrize the curve C via \gamma \colon [\nicefrac{-\pi}{2},\nicefrac{\pi}{2}] \to {\mathbb{R}}^2 defined as \gamma(t) := \bigl(\cos(t),\sin(t)\bigr). Then \gamma^{\:\prime}(t) = \bigl(-\sin(t),\cos(t)\bigr), and \int_C f \, ds = \int_\gamma f \, ds = \int_{-\pi/2}^{\pi/2} \cos(t) \sqrt{ {\bigl(-\sin(t)\bigr)}^2 + {\bigl(\cos(t)\bigr)}^2 } \, dt = \int_{-\pi/2}^{\pi/2} \cos(t) \, dt = 2.
Suppose \Gamma \subset {\mathbb{R}}^n is parametrized by a simple piecewise smooth path \gamma \colon [a,b] \to {\mathbb{R}}^n, that is \gamma\bigl( [a,b] \bigr) = \Gamma. The we define the length by \ell(\Gamma) := \int_{\Gamma} ds = \int_{\gamma} ds = \int_a^b \lVert {\gamma^{\:\prime}(t)} \rVert\, dt .
Let x,y \in {\mathbb{R}}^n be two points and write [x,y] as the straight line segment between the two points x and y. We parametrize [x,y] by \gamma(t) := (1-t)x + ty for t running between 0 and 1. We find \gamma^{\:\prime}(t) = y-x and therefore \ell\bigl([x,y]\bigr) = \int_{[x,y]} ds = \int_0^1 \lVert {y-x} \rVert \, dt = \lVert {y-x} \rVert . So the length of [x,y] is the distance between x and y in the euclidean metric.
A simple piecewise smooth path \gamma \colon [0,r] \to {\mathbb{R}}^n is said to be an arc-length parametrization if \ell\bigl( \gamma\bigl([0,t]\bigr) \bigr) = \int_0^t \lVert {\gamma^{\:\prime}(\tau)} \rVert \, d\tau = t . You can think of such a parametrization as moving around your curve at speed 1.
Exercises
Show that if \varphi \colon [a,b] \to {\mathbb{R}}^n is piecewise smooth as we defined it, then \varphi is a continuous function.
Finish the proof of for orientation reversing reparametrizations.
Prove .
[mv:exercise:pathpiece] Finish the proof of for a) orientation reversing reparametrizations, and b) piecewise smooth paths and reparametrizations.
[mv:exercise:linepiece] Finish the proof of for a) orientation reversing reparametrizations, and b) piecewise smooth paths and reparametrizations.
Suppose \gamma \colon [a,b] \to {\mathbb{R}}^n is a piecewise smooth path, and f is a continuous function defined on the image \gamma\bigl([a,b]\bigr). Provide a definition of \int_{\gamma} f \,ds.
Directly using the definitions compute:
a) the arc-length of the unit square from using the given parametrization.
b) the arc-length of the unit circle using the parametrization \gamma \colon [0,1] \to {\mathbb{R}}^2, \gamma(t) := \bigl(\cos(2\pi t),\sin(2\pi t)\bigr).
c) the arc-length of the unit circle using the parametrization \beta \colon [0,2\pi] \to {\mathbb{R}}^2, \beta(t) := \bigl(\cos(t),\sin(t)\bigr).
Suppose \gamma \colon [0,1] \to {\mathbb{R}}^n is a smooth path, and \omega is a one-form defined on the image \gamma\bigl([a,b]\bigr). For r \in [0,1], let \gamma_r \colon [0,r] \to {\mathbb{R}}^n be defined as simply the restriction of \gamma to [0,r]. Show that the function h(r) := \int_{\gamma_r} \omega is a continuously differentiable function on [0,1].
Suppose \gamma \colon [a,b] \to {\mathbb{R}}^n is a smooth path. Show that there exists an \epsilon > 0 and a smooth function \tilde{\gamma} \colon (a-\epsilon,b+\epsilon) \to {\mathbb{R}}^n with \tilde{\gamma}(t) = \gamma(t) for all t \in [a,b] and \tilde{\gamma}'(t) \not= 0 for all t \in (a-\epsilon,b+\epsilon). That is, prove that a smooth path extends some small distance past the end points.
Suppose \alpha \colon [a,b] \to {\mathbb{R}}^n and \beta \colon [c,d] \to {\mathbb{R}}^n are piecewise smooth paths such that \Gamma := \alpha\bigl([a,b]\bigr) = \beta\bigl([c,d]\bigr). Show that there exist finitely many points \{ p_1,p_2,\ldots,p_k\} \in \Gamma, such that the sets \alpha^{-1}\bigl( \{ p_1,p_2,\ldots,p_k\} \bigr) and \beta^{-1}\bigl( \{ p_1,p_2,\ldots,p_k\} \bigr) are partitions of [a,b] and [c,d], such that on any subinterval the paths are smooth (that is, they are partitions as in the definition of piecewise smooth path).
a) Suppose \gamma \colon [a,b] \to {\mathbb{R}}^n and \alpha \colon [c,d] \to {\mathbb{R}}^n are two smooth paths which are one-to-one and \gamma\bigl([a,b]\bigr) = \alpha\bigl([c,d]\bigr). Then there exists a smooth reparametrization h \colon [a,b] \to [c,d] such that \gamma = \alpha \circ h. Hint: It should be not hard to find some h. The trick is to show it is continuously differentiable with a nonvanishing derivative. You will want to apply the implicit function theorem and it may at first seem the dimensions don’t seem to work out.
b) Prove the same thing as part a, but now for simple closed paths with the further assumption that \gamma(a) = \gamma(b) = \alpha(c) = \alpha(d).
c) Prove parts a) and b) but for piecewise smooth paths, obtaining piecewise smooth reparametrizations. Hint: The trick is to find two partitions such that when restricted to a subinterval of the partition both paths have the same image and are smooth, see the above exercise.
Suppose \alpha \colon [a,b] \to {\mathbb{R}}^n and \beta \colon [b,c] \to {\mathbb{R}}^n are piecewise smooth paths with \alpha(b)=\beta(b). Let \gamma \colon [a,c] \to {\mathbb{R}}^n be defined by \gamma(t) := \begin{cases} \alpha(t) & \text{ if $t \in [a,b]$,} \\ \beta(t) & \text{ if $t \in (b,c]$.} \end{cases} Show that \gamma is a piecewise smooth path, and that if \omega is a one-form defined on the curve given by \gamma, then \int_{\gamma} \omega = \int_{\alpha} \omega + \int_{\beta} \omega .
[mv:exercise:closedcurveintegral] Suppose \gamma \colon [a,b] \to {\mathbb{R}}^n and \beta \colon [c,d] \to {\mathbb{R}}^n are two simple piecewise smooth closed paths. That is \gamma(a)=\gamma(b) and \beta(c) = \beta(d) and the restrictions \gamma|_{(a,b)} and \beta|_{(c,d)} are one-to-one. Suppose \Gamma = \gamma\bigl([a,b]\bigr) = \beta\bigl([c,d]\bigr) and \omega is a one-form defined on \Gamma \subset {\mathbb{R}}^n. Show that either \int_\gamma \omega = \int_\beta \omega, \qquad \text{or} \qquad \int_\gamma \omega = - \int_\beta \omega. In particular, the notation \int_{\Gamma} \omega makes sense if we indicate the direction in which the integral is evaluated. Hint: see previous three exercises.
[mv:exercise:curveintegral] Suppose \gamma \colon [a,b] \to {\mathbb{R}}^n and \beta \colon [c,d] \to {\mathbb{R}}^n are two piecewise smooth paths which are one-to-one except at finitely many points. That is, there is at most finitely many points p \in {\mathbb{R}}^n such that \gamma^{-1}(p) or \beta^{-1}(p) contains more than one point. Suppose \Gamma = \gamma\bigl([a,b]\bigr) = \beta\bigl([c,d]\bigr) and \omega is a one-form defined on \Gamma \subset {\mathbb{R}}^n. Show that either \int_\gamma \omega =
\int_\beta \omega,
\qquad \text{or} \qquad
\int_\gamma \omega =
- \int_\beta \omega. In particular, the notation \int_{\Gamma} \omega makes sense if we indicate the direction in which the integral is evaluated.
Hint: same hint as the last exercise.
Define \gamma \colon [0,1] \to {\mathbb{R}}^2 by \gamma(t) := \Bigl( t^3 \sin(\nicefrac{1}{t}),
t{\bigl(3t^2\sin(\nicefrac{1}{t})-t\cos(\nicefrac{1}{t})\bigr)}^2 \Bigr) for t \not= 0 and \gamma(0) = (0,0). Show that:
a) \gamma is continuously differentiable on [0,1].
b) Show that there exists an infinite sequence \{ t_n \} in [0,1] converging to 0, such that \gamma^{\:\prime}(t_n) = (0,0).
c) Show that the points \gamma(t_n) lie on the line y=0 and such that the x-coordinate of \gamma(t_n) alternates between positive and negative (if they do not alternate you only found a subsequence and you need to find them all).
d) Show that there is no piecewise smooth \alpha whose image equals \gamma\bigl([0,1]\bigr). Hint: look at part c) and show that \alpha' must be zero where it reaches the origin.
e) (Computer) if you know a plotting software that allows you to plot parametric curves, make a plot of the curve, but only for t in the range [0,0.1] otherwise you will not see the behavior. In particular, you should notice that \gamma\bigl([0,1]\bigr) has infinitely many “corners” near the origin.
Path independence
Note: 2 lectures
Path independent integrals
Let U \subset {\mathbb{R}}^n be a set and \omega a one-form defined on U, The integral of \omega is said to be path independent if for any two points x,y \in U and any two piecewise smooth paths \gamma \colon [a,b] \to U and \beta \colon [c,d] \to U such that \gamma(a) = \beta(c) = x and \gamma(b) = \beta(d) = y we have \int_\gamma \omega = \int_\beta \omega . In this case we simply write \int_x^y \omega := \int_\gamma \omega = \int_\beta \omega . Not every one-form gives a path independent integral. In fact, most do not.
Let \gamma \colon [0,1] \to {\mathbb{R}}^2 be the path \gamma(t) = (t,0) going from (0,0) to (1,0). Let \beta \colon [0,1] \to {\mathbb{R}}^2 be the path \beta(t) = \bigl(t,(1-t)t\bigr) also going between the same points. Then \begin{aligned} & \int_\gamma y \, dx = \int_0^1 \gamma_2(t) \gamma_1^{\:\prime}(t) \, dt = \int_0^1 0 (1) \, dt = 0 ,\\ & \int_\beta y \, dx = \int_0^1 \beta_2(t) \beta_1'(t) \, dt = \int_0^1 (1-t)t(1) \, dt = \frac{1}{6} .\end{aligned} So the integral of y\,dx is not path independent. In particular, \int_{(0,0)}^{(1,0)} y\,dx does not make sense.
Let U \subset {\mathbb{R}}^n be an open set and f \colon U \to {\mathbb{R}} a continuously differentiable function. Then the one-form df := \frac{\partial f}{\partial x_1} \, dx_1 + \frac{\partial f}{\partial x_2} \, dx_2 + \cdots + \frac{\partial f}{\partial x_n} \, dx_n is called the total derivative of f.
An open set U \subset {\mathbb{R}}^n is said to be path connected 5 if for every two points x and y in U, there exists a piecewise smooth path starting at x and ending at y.
We will leave as an exercise that every connected open set is path connected.
[mv:prop:pathinddf] Let U \subset {\mathbb{R}}^n be a path connected open set and \omega a one-form defined on U. Then \int_x^y \omega is path independent (for all x,y \in U) if and only if there exists a continuously differentiable f \colon U \to {\mathbb{R}} such that \omega = df.
In fact, if such an f exists, then for any two points x,y \in U \int_{x}^y \omega = f(y)-f(x) .
In other words if we fix p \in U, then f(x) = C + \int_{p}^x \omega.
First suppose that the integral is path independent. Pick p \in U and define f(x) := \int_{p}^x \omega . Write \omega = \omega_1 dx_1 + \omega_2 dx_2 + \cdots + \omega_n dx_n. We wish to show that for every j = 1,2,\ldots,n, the partial derivative \frac{\partial f}{\partial x_j} exists and is equal to \omega_j.
Let e_j be an arbitrary standard basis vector. Compute \frac{f(x+h e_j) - f(x)}{h} = \frac{1}{h} \left( \int_{p}^{x+he_j} \omega - \int_{p}^x \omega \right) = \frac{1}{h} \int_{x}^{x+he_j} \omega , which follows by and path indepdendence as \int_{p}^{x+he_j} \omega = \int_{p}^{x} \omega + \int_{x}^{x+he_j} \omega, because we could have picked a path from p to x+he_j that also happens to pass through x, and then cut this path in two.
Since U is open, suppose h is so small so that all points of distance \left\lvert {h} \right\rvert or less from x are in U. As the integral is path independent, pick the simplest path possible from x to x+he_j, that is \gamma(t) = x+t he_j for t \in [0,1]. The path is in U. Notice \gamma^{\:\prime}(t) = h e_j has only one nonzero component and that is the jth component, which is h. Therefore \frac{1}{h} \int_{x}^{x+he_j} \omega = \frac{1}{h} \int_{\gamma} \omega = \frac{1}{h} \int_0^1 \omega_j(x+the_j) h \, dt = \int_0^1 \omega_j(x+the_j) \, dt . We wish to take the limit as h \to 0. The function \omega_j is continuous. So given \epsilon > 0, h can be small enough so that \left\lvert {\omega(x)-\omega(y)} \right\rvert < \epsilon, whenever \lVert {x-y} \rVert \leq \left\lvert {h} \right\rvert. Therefore, \left\lvert {\omega_j(x+the_j)-\omega_j(x)} \right\rvert < \epsilon for all t \in [0,1], and we estimate \left\lvert {\int_0^1 \omega_j(x+the_j) \, dt - \omega(x)} \right\rvert = \left\lvert {\int_0^1 \bigl( \omega_j(x+the_j) - \omega(x) \bigr) \, dt} \right\rvert \leq \epsilon . That is, \lim_{h\to 0}\frac{f(x+h e_j) - f(x)}{h} = \omega_j(x) , which is what we wanted that is df = \omega. As \omega_j are continuous for all j, we find that f has continuous partial derivatives and therefore is continuously differentiable.
For the other direction suppose f exists such that df = \omega. Suppose we take a smooth path \gamma \colon [a,b] \to U such that \gamma(a) = x and \gamma(b) = y, then \begin{split} \int_\gamma df & = \int_a^b \biggl( \frac{\partial f}{\partial x_1}\bigl(\gamma(t)\bigr) \gamma_1^{\:\prime}(t)+ \frac{\partial f}{\partial x_2}\bigl(\gamma(t)\bigr) \gamma_2^{\:\prime}(t)+ \cdots + \frac{\partial f}{\partial x_n}\bigl(\gamma(t)\bigr) \gamma_n^{\:\prime}(t) \biggr) \, dt \\ & = \int_a^b \frac{d}{dt} \left[ f\bigl(\gamma(t)\bigr) \right]\, dt \\ & = f(y)-f(x) . \end{split} The value of the integral only depends on x and y, not the path taken. Therefore the integral is path independent. We leave checking this for a piecewise smooth path as an exercise to the reader.
Let U \subset {\mathbb{R}}^n be a path connected open set and \omega a 1-form defined on U. Then \omega = df for some continuously differentiable f \colon U \to {\mathbb{R}} if and only if \int_{\gamma} \omega = 0 for every piecewise smooth closed path \gamma \colon [a,b] \to U.
Suppose first that \omega = df and let \gamma be a piecewise smooth closed path. Then we from above we have that \int_{\gamma} \omega = f\bigl(\gamma(b)\bigr) - f\bigl(\gamma(a)\bigr) = 0 , because \gamma(a) = \gamma(b) for a closed path.
Now suppose that for every piecewise smooth closed path \gamma, \int_{\gamma} \omega = 0. Let x,y be two points in U and let \alpha \colon [0,1] \to U and \beta \colon [0,1] \to U be two piecewise smooth paths with \alpha(0) = \beta(0) = x and \alpha(1) = \beta(1) = y. Then let \gamma \colon [0,2] \to U be defined by \gamma(t) := \begin{cases} \alpha(t) & \text{if $t \in [0,1]$,} \\ \beta(2-t) & \text{if $t \in (1,2]$.} \end{cases} This is a piecewise smooth closed path and so 0 = \int_{\gamma} \omega = \int_{\alpha} \omega - \int_{\beta} \omega . This follows first by , and then noticing that the second part is \beta travelled backwards so that we get minus the \beta integral. Thus the integral of \omega on U is path independent.
There is a local criterion, a differential equation, that guarantees path independence. That is, under the right condition there exists an antiderivative f whose total derivative is the given one-form \omega. However, since the criterion is local, we only get the result locally. We can define the antiderivative in any so-called simply connected domain, which informally is a domain where any path between two points can be “continuously deformed” into any other path between those two points. To make matters simple, the usual way this result is proved is for so-called star-shaped domains.
Let U \subset {\mathbb{R}}^n be an open set and p \in U. We say U is a star shaped domain with respect to p if for any other point x \in U, the line segment between p and x is in U, that is, if (1-t)p + tx \in U for all t \in [0,1]. If we say simply star shaped, then U is star shaped with respect to some p \in U.
Notice the difference between star shaped and convex. A convex domain is star shaped, but a star shaped domain need not be convex.
Let U \subset {\mathbb{R}}^n be a star shaped domain and \omega a continuously differentiable one-form defined on U. That is, if \omega = \omega_1 dx_1 + \omega_2 dx_2 + \cdots + \omega_n dx_n , then \omega_1,\omega_2,\ldots,\omega_n are continuously differentiable functions. Suppose that for every j and k \frac{\partial \omega_j}{\partial x_k} = \frac{\partial \omega_k}{\partial x_j} , then there exists a twice continuously differentiable function f \colon U \to {\mathbb{R}} such that df = \omega.
The condition on the derivatives of \omega is precisely the condition that the second partial derivatives commute. That is, if df = \omega, and f is twice continuously differentiable, then \frac{\partial \omega_j}{\partial x_k} = \frac{\partial^2 f}{\partial x_k \partial x_j} = \frac{\partial^2 f}{\partial x_j \partial x_k} = \frac{\partial \omega_k}{\partial x_j} . The condition is therefore clearly necessary. The lemma says that it is sufficient for a star shaped U.
Suppose U is star shaped with respect to y=(y_1,y_2,\ldots,y_n) \in U.
Given x = (x_1,x_2,\ldots,x_n) \in U, define the path \gamma \colon [0,1] \to U as \gamma(t) := (1-t)y + tx, so \gamma^{\:\prime}(t) = x-y. Then let f(x) := \int_{\gamma} \omega = \int_0^1 \left( \sum_{k=1}^n \omega_k \bigl((1-t)y + tx \bigr) (x_k-y_k) \right) \, dt . We differentiate in x_j under the integral. We can do that since everything, including the partials themselves are continuous. \begin{split} \frac{\partial f}{\partial x_j}(x) & = \int_0^1 \left( \left( \sum_{k=1}^n \frac{\partial \omega_k}{\partial x_j} \bigl((1-t)y + tx \bigr) t (x_k-y_k) \right) + \omega_j \bigl((1-t)y + tx \bigr) \right) \, dt \\ & = \int_0^1 \left( \left( \sum_{k=1}^n \frac{\partial \omega_j}{\partial x_k} \bigl((1-t)y + tx \bigr) t (x_k-y_k) \right) + \omega_j \bigl((1-t)y + tx \bigr) \right) \, dt \\ & = \int_0^1 \frac{d}{dt} \left[ t \omega_j\bigl((1-t)y + tx \bigr) \right] \, dt \\ &= \omega_j(x) . \end{split} And this is precisely what we wanted.
Without some hypothesis on U the theorem is not true. Let \omega(x,y) := \frac{-y}{x^2+y^2} dx + \frac{x}{x^2+y^2} dy be defined on {\mathbb{R}}^2 \setminus \{ 0 \}. It is easy to see that \frac{\partial}{\partial y} \left[ \frac{-y}{x^2+y^2} \right] = \frac{\partial}{\partial x} \left[ \frac{x}{x^2+y^2} \right] . However, there is no f \colon {\mathbb{R}}^2 \setminus \{ 0 \} \to {\mathbb{R}} such that df = \omega. We saw in if we integrate from (1,0) to (1,0) along the unit circle, that is \gamma(t) = \bigl(\cos(t),\sin(t)\bigr) for t \in [0,2\pi] we got 2\pi and not 0 as it should be if the integral is path independent or in other words if there would exist an f such that df = \omega.
Vector fields
A common object to integrate is a so-called vector field. That is an assignment of a vector at each point of a domain.
Let U \subset {\mathbb{R}}^n be a set. A continuous function v \colon U \to {\mathbb{R}}^n is called a vector field. Write v = (v_1,v_2,\ldots,v_n).
Given a smooth path \gamma \colon [a,b] \to {\mathbb{R}}^n with \gamma\bigl([a,b]\bigr) \subset U we define the path integral of the vectorfield v as \int_{\gamma} v \cdot d\gamma := \int_a^b v\bigl(\gamma(t)\bigr) \cdot \gamma^{\:\prime}(t) \, dt , where the dot in the definition is the standard dot product. Again the definition of a piecewise smooth path is done by integrating over each smooth interval and adding the result.
If we unravel the definition we find that \int_{\gamma} v \cdot d\gamma = \int_{\gamma} v_1 dx_1 + v_2 dx_2 + \cdots + v_n dx_n . Therefore what we know about integration of one-forms carries over to the integration of vector fields. For example path independence for integration of vector fields is simply that \int_x^y v \cdot d\gamma is path independent (so for any \gamma) if and only if v = \nabla f, that is the gradient of a function. The function f is then called the potential for v.
A vector field v whose path integrals are path independent is called a conservative vector field. The naming comes from the fact that such vector fields arise in physical systems where a certain quantity, the energy is conserved.
Exercises
Find an f \colon {\mathbb{R}}^2 \to {\mathbb{R}} such that df = xe^{x^2+y^2} dx + ye^{x^2+y^2} dy.
Find an \omega_2 \colon {\mathbb{R}}^2 \to {\mathbb{R}} such that there exists a continuously differentiable f \colon {\mathbb{R}}^2 \to {\mathbb{R}} for which df = e^{xy} dx + \omega_2 dy.
Finish the proof of , that is, we only proved the second direction for a smooth path, not a piecewise smooth path.
Show that a star shaped domain U \subset {\mathbb{R}}^n is path connected.
Show that U := {\mathbb{R}}^2 \setminus \{ (x,y) \in {\mathbb{R}}^2 : x \leq 0, y=0 \} is star shaped and find all points (x_0,y_0) \in U such that U is star shaped with respect to (x_0,y_0).
Suppose U_1 and U_2 are two open sets in {\mathbb{R}}^n with U_1 \cap U_2 nonempty and connected. Suppose there exists an f_1 \colon U_1 \to {\mathbb{R}} and f_2 \colon U_2 \to {\mathbb{R}}, both twice continuously differentiable such that d f_1 = d f_2 on U_1 \cap U_2. Then there exists a twice differentiable function F \colon U_1 \cup U_2 \to {\mathbb{R}} such that dF = df_1 on U_1 and dF = df_2 on U_2.
Let \gamma \colon [a,b] \to {\mathbb{R}}^n be a simple nonclosed piecewise smooth path (so \gamma is one-to-one). Suppose \omega is a continuously differentiable one-form defined on some open set V with \gamma\bigl([a,b]\bigr) \subset V and \frac{\partial \omega_j}{\partial x_k} = \frac{\partial \omega_k}{\partial
x_j} for all j and k. Prove that there exists an open set U with \gamma\bigl([a,b]\bigr) \subset U \subset V and a twice continuously differentiable function f \colon U \to {\mathbb{R}} such that df = \omega.
Hint 1: \gamma\bigl([a,b]\bigr) is compact.
Hint 2: Show that you can cover the curve by finitely many balls in sequence so that the kth ball only intersects the (k-1)th ball.
Hint 3: See previous exercise.
a) Show that a connected open set is path connected. Hint: Start with two points x and y in a connected set U, and let U_x \subset U is the set of points that are reachable by a path from x and similarly for U_y. Show that both sets are open, since they are nonempty (x \in U_x and y \in U_y) it must be that U_x = U_y = U.
b) Prove the converse that is, a path connected set U \subset {\mathbb{R}}^n is connected. Hint: for contradiction assume there exist two open and disjoint nonempty open sets and then assume there is a piecewise smooth (and therefore continuous) path between a point in one to a point in the other.
Usually path connectedness is defined using just continuous paths rather than piecewise smooth paths. Prove that the definitions are equivalent, in other words prove the following statement:
Suppose U \subset {\mathbb{R}}^n is such that for any x,y \in U, there exists a continuous function \gamma \colon [a,b] \to U such that \gamma(a) = x and \gamma(b) = y. Then U is path connected (in other words, then there exists a piecewise smooth path).
Take \omega(x,y) = \frac{-y}{x^2+y^2} dx + \frac{x}{x^2+y^2} dy defined on {\mathbb{R}}^2 \setminus \{ (0,0) \}. Let \gamma \colon [a,b] \to {\mathbb{R}}^2
\setminus \{ (0,0) \} be a closed piecewise smooth path. Let R:=\{ (x,y) \in {\mathbb{R}}^2 : x \leq 0 \text{ and } y=0 \}. Suppose R \cap \gamma\bigl([a,b]\bigr) is a finite set of k points. Then \int_{\gamma} \omega = 2 \pi \ell for some integer \ell with \left\lvert {\ell} \right\rvert \leq k.
Hint 1: First prove that for a path \beta that starts and end on R but does not intersect it otherwise, you find that \int_{\beta} \omega is -2\pi, 0, or 2\pi. Hint 2: You proved above that {\mathbb{R}}^2 \setminus R is star shaped.
Note: The number \ell is called the winding number it measures how many times does \gamma wind around the origin in the clockwise direction.
Multivariable integral
Riemann integral over rectangles
Note: 2–3 lectures
As in , we define the Riemann integral using the Darboux upper and lower integrals. The ideas in this section are very similar to integration in one dimension. The complication is mostly notational. The differences between one and several dimensions will grow more pronounced in the sections following.
Rectangles and partitions
Let (a_1,a_2,\ldots,a_n) and (b_1,b_2,\ldots,b_n) be such that a_k \leq b_k for all k. A set of the form [a_1,b_1] \times [a_2,b_2] \times \cdots \times [a_n,b_n] is called a closed rectangle. In this setting it is sometimes useful to allow a_k = b_k, in which case we think of [a_k,b_k] = \{ a_k \} as usual. If a_k < b_k for all k, then a set of the form (a_1,b_1) \times (a_2,b_2) \times \cdots \times (a_n,b_n) is called an open rectangle.
For an open or closed rectangle R := [a_1,b_1] \times [a_2,b_2] \times \cdots \times [a_n,b_n] \subset {\mathbb{R}}^n or R := (a_1,b_1) \times (a_2,b_2) \times \cdots \times (a_n,b_n) \subset {\mathbb{R}}^n, we define the n-dimensional volume by V(R) := (b_1-a_1) (b_2-a_2) \cdots (b_n-a_n) .
A partition P of the closed rectangle R = [a_1,b_1] \times [a_2,b_2] \times \cdots \times [a_n,b_n] is a finite set of partitions P_1,P_2,\ldots,P_n of the intervals [a_1,b_1], [a_2,b_2],\ldots, [a_n,b_n]. We write P=(P_1,P_2,\ldots,P_n). That is, for every k there is an integer \ell_k and the finite set of numbers P_k = \{ x_{k,0},x_{k,1},x_{k,2},\ldots,x_{k,\ell_k} \} such that a_k = x_{k,0} < x_{k,1} < x_{k,2} < \cdots < x_{k,{\ell_k}-1} < x_{k,\ell_k} = b_k . Picking a set of n integers j_1,j_2,\ldots,j_n where j_k \in \{ 1,2,\ldots,\ell_k \} we get the subrectangle [x_{1,j_1-1}~,~ x_{1,j_1}] \times [x_{2,j_2-1}~,~ x_{2,j_2}] \times \cdots \times [x_{n,j_n-1}~,~ x_{n,j_n}] . For simplicity, we order the subrectangles somehow and we say \{R_1,R_2,\ldots,R_N\} are the subrectangles corresponding to the partition P of R. More simply, we say they are the subrectangles of P. In other words, we subdivided the original rectangle into many smaller subrectangles. See . It is not difficult to see that these subrectangles cover our original R, and their volume sums to that of R. That is, R= \bigcup_{j=1}^N R_j , \qquad \text{and} \qquad V(R) = \sum_{j=1}^N V(R_j).
When R_k = [x_{1,j_1-1}~,~ x_{1,j_1}] \times [x_{2,j_2-1}~,~ x_{2,j_2}] \times \cdots \times [x_{n,j_n-1}~,~ x_{n,j_n}] , then V(R_k) = \Delta x_{1,j_1} \Delta x_{2,j_2} \cdots \Delta x_{n,j_n} = (x_{1,j_1}-x_{1,j_1-1}) (x_{2,j_2}-x_{2,j_2-1}) \cdots (x_{n,j_n}-x_{n,j_n-1}) .
Let R \subset {\mathbb{R}}^n be a closed rectangle and let f \colon R \to {\mathbb{R}} be a bounded function. Let P be a partition of [a,b] and suppose that there are N subrectangles R_1,R_2,\ldots,R_N. Define \begin{aligned} & m_i := \inf \{ f(x) : x \in R_i \} , \\ & M_i := \sup \{ f(x) : x \in R_i \} , \\ & L(P,f) := \sum_{i=1}^N m_i V(R_i) , \\ & U(P,f) := \sum_{i=1}^N M_i V(R_i) .\end{aligned} We call L(P,f) the lower Darboux sum and U(P,f) the upper Darboux sum.
The indexing in the definition may be complicated, but fortunately we generally do not need to go back directly to the definition often. We start proving facts about the Darboux sums analogous to the one-variable results.
[mv:sumulbound:prop] Suppose R \subset {\mathbb{R}}^n is a closed rectangle and f \colon R \to {\mathbb{R}} is a bounded function. Let m, M \in {\mathbb{R}} be such that for all x \in R we have m \leq f(x) \leq M. For any partition P of R we have %\label{mv:sumulbound:eq} m V(R) \leq L(P,f) \leq U(P,f) \leq M\, V(R) .
Let P be a partition. Then for all i we have m \leq m_i and M_i \leq M. Also m_i \leq M_i for all i. Finally \sum_{i=1}^N V(R_i) = V(R). Therefore, \begin{gathered} m V(R) = m \left( \sum_{i=1}^N V(R_i) \right) = \sum_{i=1}^N m V(R_i) \leq \sum_{i=1}^N m_i V(R_i) \leq \\ \leq \sum_{i=1}^N M_i V(R_i) \leq \sum_{i=1}^N M \,V(R_i) = M \left( \sum_{i=1}^N V(R_i) \right) = M \,V(R) . \qedhere\end{gathered}
Upper and lower integrals
By the set of upper and lower Darboux sums are bounded sets and we can take their infima and suprema. As before, we now make the following definition.
If f \colon R \to {\mathbb{R}} is a bounded function on a closed rectangle R \subset {\mathbb{R}}^n. Define \underline{\int_R} f := \sup \{ L(P,f) : P \text{ a partition of $R$} \} , \qquad \overline{\int_R} f := \inf \{ U(P,f) : P \text{ a partition of $R$} \} . We call \underline{\int} the lower Darboux integral and \overline{\int} the upper Darboux integral.
As in one dimension we have refinements of partitions.
Let R \subset {\mathbb{R}}^n be a closed rectangle. Let P = ( P_1, P_2, \ldots, P_n ) and \widetilde{P} = ( \widetilde{P}_1, \widetilde{P}_2, \ldots, \widetilde{P}_n ) be partitions of R. We say \widetilde{P} a refinement of P if, as sets, P_k \subset \widetilde{P}_k for all k = 1,2,\ldots,n.
It is not difficult to see that if \widetilde{P} is a refinement of P, then subrectangles of P are unions of subrectangles of \widetilde{P}. Simply put, in a refinement we take the subrectangles of P, and we cut them into smaller subrectangles. See .
[mv:prop:refinement] Suppose R \subset {\mathbb{R}}^n is a closed rectangle, P is a partition of R and \widetilde{P} is a refinement of P. If f \colon R \to {\mathbb{R}} be a bounded function, then L(P,f) \leq L(\widetilde{P},f) \qquad \text{and} \qquad U(\widetilde{P},f) \leq U(P,f) .
We prove the first inequality, the second follows similarly. Let R_1,R_2,\ldots,R_N be the subrectangles of P and \widetilde{R}_1,\widetilde{R}_2,\ldots,\widetilde{R}_{\widetilde{N}} be the subrectangles of \widetilde{R}. Let I_k be the set of all indices j such that \widetilde{R}_j \subset R_k. For example, using the examples in figures [mv:figrect] and [mv:figrectpart], I_4 = \{ 6, 7, 8, 9 \} and R_4 = \widetilde{R}_6 \cup \widetilde{R}_7 \cup \widetilde{R}_8 \cup \widetilde{R}_9. We notice in general that R_k = \bigcup_{j \in I_k} \widetilde{R}_j, \qquad V(R_k) = \sum_{j \in I_k} V(\widetilde{R}_j).
Let m_j := \inf \{ f(x) : x \in R_j \}, and \widetilde{m}_j := \inf \{ f(x) : \in \widetilde{R}_j \} as usual. Notice also that if j \in I_k, then m_k \leq \widetilde{m}_j. Then L(P,f) = \sum_{k=1}^N m_k V(R_k) = \sum_{k=1}^N \sum_{j\in I_k} m_k V(\widetilde{R}_j) \leq \sum_{k=1}^N \sum_{j\in I_k} \widetilde{m}_j V(\widetilde{R}_j) = \sum_{j=1}^{\widetilde{N}} \widetilde{m}_j V(\widetilde{R}_j) = L(\widetilde{P},f) . \qedhere
The key point of this next proposition is that the lower Darboux integral is less than or equal to the upper Darboux integral.
[mv:intulbound:prop] Let R \subset {\mathbb{R}}^n be a closed rectangle and f \colon R \to {\mathbb{R}} a bounded function. Let m, M \in {\mathbb{R}} be such that for all x \in R we have m \leq f(x) \leq M. Then \label{mv:intulbound:eq} m V(R) \leq \underline{\int_R} f \leq \overline{\int_R} f \leq M \, V(R).
For any partition P, via , mV(R) \leq L(P,f) \leq U(P,f) \leq M\,V(R). Taking supremum of L(P,f) and infimum of U(P,f) over all P, we obtain the first and the last inequality.
The key inequality in [mv:intulbound:eq] is the middle one. Let P=(P_1,P_2,\ldots,P_n) and Q=(Q_1,Q_2,\ldots,Q_n) be partitions of R. Define \widetilde{P} = ( \widetilde{P}_1,\widetilde{P}_2,\ldots,\widetilde{P}_n ) by letting \widetilde{P}_k = P_k \cup Q_k. Then \widetilde{P} is a partition of R as can easily be checked, and \widetilde{P} is a refinement of P and a refinement of Q. By , L(P,f) \leq L(\widetilde{P},f) and U(\widetilde{P},f) \leq U(Q,f). Therefore, L(P,f) \leq L(\widetilde{P},f) \leq U(\widetilde{P},f) \leq U(Q,f) . In other words, for two arbitrary partitions P and Q we have L(P,f) \leq U(Q,f). Via Proposition 1.2.7 from volume I, we obtain \sup \{ L(P,f) : \text{$P$ a partition of $R$} \} \leq \inf \{ U(P,f) : \text{$P$ a partition of $R$} \} . In other words \underline{\int_R} f \leq \overline{\int_R} f.
The Riemann integral
We have all we need to define the Riemann integral in n-dimensions over rectangles. Again, the Riemann integral is only defined on a certain class of functions, called the Riemann integrable functions.
Let R \subset {\mathbb{R}}^n be a closed rectangle. Let f \colon R \to {\mathbb{R}} be a bounded function such that \underline{\int_R} f(x)~dx = \overline{\int_R} f(x)~dx . Then f is said to be Riemann integrable, and we sometimes say simply integrable. The set of Riemann integrable functions on R is denoted by {\mathcal{R}}(R). When f \in {\mathcal{R}}(R) we define the Riemann integral \int_R f := \underline{\int_R} f = \overline{\int_R} f .
When the variable x \in {\mathbb{R}}^n needs to be emphasized we write \int_R f(x)~dx, \qquad \int_R f(x_1,\ldots,x_n)~dx_1 \cdots dx_n, \qquad \text{or} \qquad \int_R f(x)~dV . If R \subset {\mathbb{R}}^2, then often instead of volume we say area, and hence write \int_R f(x)~dA .
implies immediately the following proposition.
[mv:intbound:prop] Let f \colon R \to {\mathbb{R}} be a Riemann integrable function on a closed rectangle R \subset {\mathbb{R}}^n. Let m, M \in {\mathbb{R}} be such that m \leq f(x) \leq M for all x \in R. Then m V(R) \leq \int_{R} f \leq M \, V(R) .
A constant function is Riemann integrable. Suppose f(x) = c for all x on R. Then c V(R) \leq \underline{\int_R} f \leq \overline{\int_R} f \leq cV(R) . So f is integrable, and furthermore \int_R f = cV(R).
The proofs of linearity and monotonicity are almost completely identical as the proofs from one variable. We therefore leave it as an exercise to prove the next two propositions.
[mv:intlinearity:prop] Let R \subset {\mathbb{R}}^n be a closed rectangle and let f and g be in {\mathcal{R}}(R) and \alpha \in {\mathbb{R}}.
- \alpha f is in {\mathcal{R}}(R) and \int_R \alpha f = \alpha \int_R f .
f+g is in {\mathcal{R}}(R) and \int_R (f+g) = \int_R f + \int_R g .
Let R \subset {\mathbb{R}}^n be a closed rectangle, let f and g be in {\mathcal{R}}(R), and suppose f(x) \leq g(x) for all x \in R. Then \int_R f \leq \int_R g .
Checking for integrability using the definition often involves the following technique, as in the single variable case.
[mv:prop:upperlowerepsilon] Let R \subset {\mathbb{R}}^n be a closed rectangle and f \colon R \to {\mathbb{R}} a bounded function. Then f \in {\mathcal{R}}(R) if and only if for every \epsilon > 0, there exists a partition P of R such that U(P,f) - L(P,f) < \epsilon .
First, if f is integrable, then clearly the supremum of L(P,f) and infimum of U(P,f) must be equal and hence the infimum of U(P,f)-L(P,f) is zero. Therefore for every \epsilon > 0 there must be some partition P such that U(P,f) - L(P,f) < \epsilon.
For the other direction, given an \epsilon > 0 find P such that U(P,f) - L(P,f) < \epsilon. \overline{\int_R} f - \underline{\int_R} f \leq U(P,f) - L(P,f) < \epsilon . As \overline{\int_R} f \geq \underline{\int_R} f and the above holds for every \epsilon > 0, we conclude \overline{\int_R} f = \underline{\int_R} f and f \in {\mathcal{R}}(R).
For simplicity if f \colon S \to {\mathbb{R}} is a function and R \subset S is a closed rectangle, then if the restriction f|_R is integrable we say f is integrable on R, or f \in {\mathcal{R}}(R) and we write \int_R f := \int_R f|_R .
[mv:prop:integralsmallerset] For a closed rectangle S \subset {\mathbb{R}}^n, if f \colon S \to {\mathbb{R}} is integrable and R \subset S is a closed rectangle, then f is integrable over R.
Given \epsilon > 0, we find a partition P of S such that U(P,f)-L(P,f) < \epsilon. By making a refinement of P if necessary, we assume that the endpoints of R are in P. In other words, R is a union of subrectangles of P. The subrectangles of P divide into two collections, ones that are subsets of R and ones whose intersection with the interior of R is empty. Suppose R_1,R_2\ldots,R_K are the subrectangles that are subsets of R and let R_{K+1},\ldots, R_N be the rest. Let \widetilde{P} be the partition of R composed of those subrectangles of P contained in R. Using the same notation as before, \begin{split} \epsilon & > U(P,f)-L(P,f) = \sum_{k=1}^K (M_k-m_k) V(R_k) + \sum_{k=K+1}^N (M_k-m_k) V(R_k) \\ & \geq \sum_{k=1}^K (M_k-m_k) V(R_k) = U(\widetilde{P},f|_R)-L(\widetilde{P},f|_R) . \end{split} Therefore, f|_R is integrable.
Integrals of continuous functions
Although later we will prove a much more general result, it is useful to start with integrability of continuous functions. First we wish to measure the fineness of partitions. In one variable we measured the length of a subinterval, in several variables, we similarly measure the sides of a subrectangle. We say a rectangle R = [a_1,b_1] \times [a_2,b_2] \times \cdots \times [a_n,b_n] has longest side at most \alpha if b_k-a_k \leq \alpha for all k=1,2,\ldots,n.
[prop:diameterrectangle] If a rectangle R \subset {\mathbb{R}}^n has longest side at most \alpha. Then for any x,y \in R, \lVert {x-y} \rVert \leq \sqrt{n} \, \alpha .
\begin{split} \lVert {x-y} \rVert & = \sqrt{ {(x_1-y_1)}^2 + {(x_2-y_2)}^2 + \cdots + {(x_n-y_n)}^2 } \\ & \leq \sqrt{ {(b_1-a_1)}^2 + {(b_2-a_2)}^2 + \cdots + {(b_n-a_n)}^2 } \\ & \leq \sqrt{ {\alpha}^2 + {\alpha}^2 + \cdots + {\alpha}^2 } = \sqrt{n} \, \alpha . \qedhere \end{split}
[mv:thm:contintrect] Let R \subset {\mathbb{R}}^n be a closed rectangle and f \colon R \to {\mathbb{R}} a continuous function, then f \in {\mathcal{R}}(R).
The proof is analogous to the one variable proof with some complications. The set R is a closed and bounded subset of {\mathbb{R}}^n, and hence compact. So f is not just continuous, but in fact uniformly continuous by Theorem 7.5 from volume I. Let \epsilon > 0 be given. Find a \delta > 0 such that \lVert {x-y} \rVert < \delta implies \left\lvert {f(x)-f(y)} \right\rvert < \frac{\epsilon}{V(R)}.
Let P be a partition of R, such that longest side of any subrectangle is strictly less than \frac{\delta}{\sqrt{n}}. If x, y \in R_k for some subrectangle R_k of P we have, by the proposition above, \lVert {x-y} \rVert < \sqrt{n} \frac{\delta}{\sqrt{n}} = \delta. Therefore f(x)-f(y) \leq \left\lvert {f(x)-f(y)} \right\rvert < \frac{\epsilon}{V(R)} . As f is continuous on R_k, it attains a maximum and a minimum on this subrectangle. Let x be a point where f attains the maximum and y be a point where f attains the minimum. Then f(x) = M_k and f(y) = m_k in the notation from the definition of the integral. Therefore, M_i-m_i = f(x)-f(y) < \frac{\epsilon}{V(R)} . And so \begin{split} U(P,f) - L(P,f) & = \left( \sum_{k=1}^N M_k V(R_k) \right) - \left( \sum_{k=1}^N m_k V(R_k) \right) \\ & = \sum_{k=1}^N (M_k-m_k) V(R_k) \\ & < \frac{\epsilon}{V(R)} \sum_{k=1}^N V(R_k) = \epsilon. \end{split} Via application of we find that f \in {\mathcal{R}}(R).
Integration of functions with compact support
Let U \subset {\mathbb{R}}^n be an open set and f \colon U \to {\mathbb{R}} be a function. We say the support of f is the set \operatorname{supp} (f) := \overline{ \{ x \in U : f(x) \not= 0 \} } , where the closure is with respect to the subspace topology on U. Recall that taking the closure with respect to the subspace topology is the same as \overline{ \{ x \in U : f(x) \not= 0 \} } \cap U, now taking the closure with respect to the ambient euclidean space {\mathbb{R}}^n. In particular, \operatorname{supp} (f) \subset U. That is, the support is the closure (in U) of the set of points where the function is nonzero. Its complement in U is open. If x \in U and x is not in the support of f, then f is constantly zero in a whole neighborhood of x.
A function f is said to have compact support if \operatorname{supp}(f) is a compact set.
Suppose B(0,1) \subset {\mathbb{R}}^2 is the unit disc. The function f \colon B(0,1) \to {\mathbb{R}} defined by f(x,y) := \begin{cases} 0 & \text{if $\sqrt{x^2+y^2} > \nicefrac{1}{2}$}, \\ \nicefrac{1}{2} - \sqrt{x^2+y^2} & \text{if $\sqrt{x^2+y^2} \leq \nicefrac{1}{2}$}, \end{cases} is continuous on B(0,1) and its support is the smaller closed ball C(0,\nicefrac{1}{2}). As that is a compact set, f has compact support.
Similarly g \colon B(0,1) \to {\mathbb{R}} defined by g(x,y) := \begin{cases} 0 & \text{if $x \leq 0$}, \\ x & \text{if $x > 0$}, \end{cases} is continuous on B(0,1), but its support is the set \{ (x,y) \in B(0,1) : x \geq 0 \}. In particular, g is not compactly supported.
We will mostly consider the case when U={\mathbb{R}}^n. In light of the following exercise, this is not an oversimplification.
Suppose U \subset {\mathbb{R}}^n is open and f \colon U \to {\mathbb{R}} is continuous and of compact support. Show that the function \widetilde{f} \colon {\mathbb{R}}^n \to {\mathbb{R}} \widetilde{f}(x) := \begin{cases} f(x) & \text{ if $x \in U$,} \\ 0 & \text{ otherwise,} \end{cases} is continuous.
On the other hand for the unit disc B(0,1) \subset {\mathbb{R}}^2, the function continuous f \colon B(0,1) \to {\mathbb{R}} defined by f(x,y) := \sin\bigl(\frac{1}{1-x^2-y^2}\bigr), does not have compact support; as f is not constantly zero on neighborhood of any point in B(0,1), we know that the support is the entire disc B(0,1). The function clearly does not extend as above to a continuous function. In fact it is not difficult to show that it cannot be extended in any way whatsoever to be continuous on all of {\mathbb{R}}^2 (the boundary of the disc is the problem).
[mv:prop:rectanglessupp] Suppose f \colon {\mathbb{R}}^n \to {\mathbb{R}} be a continuous function with compact support. If R and S are closed rectangles such that \operatorname{supp}(f) \subset R and \operatorname{supp}(f) \subset S, then \int_S f = \int_R f .
As f is continuous, it is automatically integrable on the rectangles R, S, and R \cap S. Then says \int_S f = \int_{S \cap R} f = \int_R f.
Because of this proposition, when f \colon {\mathbb{R}}^n \to {\mathbb{R}} has compact support and is integrable over a rectangle R containing the support we write \int f := \int_R f \qquad \text{or} \qquad \int_{{\mathbb{R}}^n} f := \int_R f . For example, if f is continuous and of compact support, then \int_{{\mathbb{R}}^n} f exists.
Exercises
Prove .
Suppose R is a rectangle with the length of one of the sides equal to 0. For any bounded function f, show that f \in {\mathcal{R}}(R) and \int_R f = 0.
[mv:zerosiderectangle] Suppose R is a rectangle with the length of one of the sides equal to 0, and suppose S is a rectangle with R \subset S. If f is a bounded function such that f(x) = 0 for x \in R \setminus S, show that f \in {\mathcal{R}}(R) and \int_R f = 0.
Suppose f\colon {\mathbb{R}}^n \to {\mathbb{R}} is such that f(x) := 0 if x\not= 0 and f(0) := 1. Show that f is integrable on R := [-1,1] \times [-1,1] \times \cdots \times [-1,1] directly using the definition, and find \int_R f.
[mv:zeroinside] Suppose R is a closed rectangle and h \colon R \to {\mathbb{R}} is a bounded function such that h(x) = 0 if x \notin \partial R (the boundary of R). Let S be any closed rectangle. Show that h \in {\mathcal{R}}(S) and \int_{S} h = 0 . Hint: Write h as a sum of functions as in .
[mv:zerooutside] Suppose R and R' are two closed rectangles with R' \subset R. Suppose f \colon R \to {\mathbb{R}} is in {\mathcal{R}}(R') and f(x) = 0 for x \in R \setminus R'. Show that f \in {\mathcal{R}}(R) and \int_{R'} f = \int_R f . Do this in the following steps.
a) First do the proof assuming that furthermore f(x) = 0 whenever x
\in \overline{R \setminus R'}.
b) Write f(x) = g(x) + h(x) where g(x) = 0 whenever x
\in \overline{R \setminus R'}, and h(x) is zero except perhaps on \partial R'. Then show \int_R h = \int_{R'} h = 0 (see ).
c) Show \int_{R'} f = \int_R f.
Suppose R' \subset {\mathbb{R}}^n and R'' \subset {\mathbb{R}}^n are two rectangles such that R = R' \cup R'' is a rectangle, and R' \cap R'' is rectangle with one of the sides having length 0 (that is V(R' \cap R'') = 0). Let f \colon R \to {\mathbb{R}} be a function such that f \in {\mathcal{R}}(R') and f \in {\mathcal{R}}(R''). Show that f \in {\mathcal{R}}(R) and \int_{R} f = \int_{R'} f + \int_{R''} f . Hint: see previous exercise.
Prove a stronger version of . Suppose f \colon {\mathbb{R}}^n \to {\mathbb{R}} be a function with compact support but not necessarily continuous. Prove that if R is a closed rectangle such that \operatorname{supp}(f) \subset R and f is integrable over R, then for any other closed rectangle S with \operatorname{supp}(f) \subset S, the function f is integrable over S and \int_S f = \int_R f. Hint: See .
Suppose R and S are closed rectangles of {\mathbb{R}}^n. Define f \colon {\mathbb{R}}^n \to {\mathbb{R}} as f(x) := 1 if x \in R, and f(x) := 0 otherwise. Prove f is integrable over S and compute \int_S f. Hint: Consider S \cap R.
Let R = [0,1] \times [0,1] \subset {\mathbb{R}}^2.
a) Suppose f \colon R \to {\mathbb{R}} is defined by f(x,y) :=
\begin{cases}
1 & \text{ if $x = y$,} \\
0 & \text{ else.}
\end{cases} Show that f \in {\mathcal{R}}(R) and compute \int_R f.
b) Suppose f \colon R \to {\mathbb{R}} is defined by f(x,y) :=
\begin{cases}
1 & \text{ if $x \in {\mathbb{Q}}$ or $y \in {\mathbb{Q}}$,} \\
0 & \text{ else.}
\end{cases} Show that f \notin {\mathcal{R}}(R).
Suppose R is a closed rectangle, and suppose S_j are closed rectangles such that S_j \subset R and S_j \subset S_{j+1} for all j. Suppose f \colon R \to {\mathbb{R}} is bounded and f \in {\mathcal{R}}(S_j) for all j. Show that f \in {\mathcal{R}}(R) and \lim_{j\to\infty} \int_{S_j} f = \int_R f .
Suppose f\colon [-1,1] \times [-1,1] \to {\mathbb{R}} is a Riemann integrable function such f(x) = -f(-x). Using the definition prove \int_{[-1,1] \times [-1,1]} f = 0 .
Iterated integrals and Fubini theorem
Note: 1–2 lectures
The Riemann integral in several variables is hard to compute from the definition. For one-dimensional Riemann integral we have the fundamental theorem of calculus and we can compute many integrals without having to appeal to the definition of the integral. We will rewrite a Riemann integral in several variables into several one-dimensional Riemann integrals by iterating. However, if f \colon [0,1]^2 \to {\mathbb{R}} is a Riemann integrable function, it is not immediately clear if the three expressions \int_{[0,1]^2} f , \qquad \int_0^1 \int_0^1 f(x,y) \, dx \, dy , \qquad \text{and} \qquad \int_0^1 \int_0^1 f(x,y) \, dy \, dx are equal, or if the last two are even well-defined.
Define f(x,y) := \begin{cases} 1 & \text{ if $x=\nicefrac{1}{2}$ and $y \in {\mathbb{Q}}$,} \\ 0 & \text{ otherwise.} \end{cases} Then f is Riemann integrable on R := [0,1]^2 and \int_R f = 0. Furthermore, \int_0^1 \int_0^1 f(x,y) \, dx \, dy = 0. However \int_0^1 f(\nicefrac{1}{2},y) \, dy does not exist, so we cannot even write \int_0^1 \int_0^1 f(x,y) \, dy \, dx.
Proof: Let us start with integrability of f. We simply take the partition of [0,1]^2 where the partition in the x direction is \{ 0, \nicefrac{1}{2}-\epsilon, \nicefrac{1}{2}+\epsilon,1\} and in the y direction \{ 0, 1 \}. The subrectangles of the partition are R_1 := [0, \nicefrac{1}{2}-\epsilon] \times [0,1], \qquad R_2 := [\nicefrac{1}{2}-\epsilon, \nicefrac{1}{2}+\epsilon] \times [0,1], \qquad R_3 := [\nicefrac{1}{2}+\epsilon,1] \times [0,1] . We have m_1 = M_1 = 0, m_2 =0, M_2 = 1, and m_3 = M_3 = 0. Therefore, L(P,f) = m_1 V(R_1) + m_2 V(R_2) + m_3 V(R_3) = 0 (\nicefrac{1}{2}-\epsilon) + 0 (2\epsilon) + 0 (\nicefrac{1}{2}-\epsilon) = 0 , and U(P,f) = M_1 V(R_1) + M_2 V(R_2) + M_3 V(R_3) = 0 (\nicefrac{1}{2}-\epsilon) + 1 (2\epsilon) + 0 (\nicefrac{1}{2}-\epsilon) = 2 \epsilon . The upper and lower sum are arbitrarily close and the lower sum is always zero, so the function is integrable and \int_R f = 0.
For any y, the function that takes x to f(x,y) is zero except perhaps at a single point x=\nicefrac{1}{2}. We know that such a function is integrable and \int_0^1 f(x,y) \, dx = 0. Therefore, \int_0^1 \int_0^1 f(x,y) \, dx \, dy = 0.
However if x=\nicefrac{1}{2}, the function that takes y to f(\nicefrac{1}{2},y) is the nonintegrable function that is 1 on the rationals and 0 on the irrationals. See Example 5.1.4 from volume I.
We will solve this problem of undefined inside integrals by using the upper and lower integrals, which are always defined.
We split the coordinates of {\mathbb{R}}^{n+m} into two parts. That is, we write the coordinates on {\mathbb{R}}^{n+m} = {\mathbb{R}}^n \times {\mathbb{R}}^m as (x,y) where x \in {\mathbb{R}}^n and y \in {\mathbb{R}}^m. For a function f(x,y) we write f_x(y) := f(x,y) when x is fixed and we wish to speak of the function in terms of y. We write f^y(x) := f(x,y) when y is fixed and we wish to speak of the function in terms of x.
[mv:fubinivA] Let R \times S \subset {\mathbb{R}}^n \times {\mathbb{R}}^m be a closed rectangle and f \colon R \times S \to {\mathbb{R}} be integrable. The functions g \colon R \to {\mathbb{R}} and h \colon R \to {\mathbb{R}} defined by g(x) := \underline{\int_S} f_x \qquad \text{and} \qquad h(x) := \overline{\int_S} f_x are integrable over R and \int_R g = \int_R h = \int_{R \times S} f .
In other words \int_{R \times S} f = \int_R \left( \underline{\int_S} f(x,y) \, dy \right) \, dx = \int_R \left( \overline{\int_S} f(x,y) \, dy \right) \, dx . If it turns out that f_x is integrable for all x, for example when f is continuous, then we obtain the more familiar \int_{R \times S} f = \int_R \int_S f(x,y) \, dy \, dx .
Any partition of R \times S is a concatenation of a partition of R and a partition of S. That is, write a partition of R \times S as (P,P') = (P_1,P_2,\ldots,P_n,P'_1,P'_2,\ldots,P'_m), where P = (P_1,P_2,\ldots,P_n) and P' = (P'_1,P'_2,\ldots,P'_m) are partitions of R and S respectively. Let R_1,R_2,\ldots,R_N be the subrectangles of P and R'_1,R'_2,\ldots,R'_K be the subrectangles of P'. Then the subrectangles of (P,P') are R_j \times R'_k where 1 \leq j \leq N and 1 \leq k \leq K.
Let m_{j,k} := \inf_{(x,y) \in R_j \times R'_k} f(x,y) . We notice that V(R_j \times R'_k) = V(R_j)V(R'_k) and hence L\bigl((P,P'),f\bigr) = \sum_{j=1}^N \sum_{k=1}^K m_{j,k} \, V(R_j \times R'_k) = \sum_{j=1}^N \left( \sum_{k=1}^K m_{j,k} \, V(R'_k) \right) V(R_j) . If we let m_k(x) := \inf_{y \in R'_k} f(x,y) = \inf_{y \in R'_k} f_x(y) , then of course if x \in R_j, then m_{j,k} \leq m_k(x). Therefore \sum_{k=1}^K m_{j,k} \, V(R'_k) \leq \sum_{k=1}^K m_k(x) \, V(R'_k) = L(P',f_x) \leq \underline{\int_S} f_x = g(x) . As we have the inequality for all x \in R_j we have \sum_{k=1}^K m_{j,k} \, V(R'_k) \leq \inf_{x \in R_j} g(x) . We thus obtain L\bigl((P,P'),f\bigr) \leq \sum_{j=1}^N \left( \inf_{x \in R_j} g(x) \right) V(R_j) = L(P,g) .
Similarly U\bigl((P,P'),f) \geq U(P,h), and the proof of this inequality is left as an exercise.
Putting this together we have L\bigl((P,P'),f\bigr) \leq L(P,g) \leq U(P,g) \leq U(P,h) \leq U\bigl((P,P'),f\bigr) . And since f is integrable, it must be that g is integrable as U(P,g) - L(P,g) \leq U\bigl((P,P'),f\bigr) - L\bigl((P,P'),f\bigr) , and we can make the right hand side arbitrarily small. As for any partition we have L\bigl((P,P'),f\bigr) \leq L(P,g) \leq U\bigl((P,P'),f\bigr) we must have that \int_R g = \int_{R \times S} f.
Similarly we have L\bigl((P,P'),f\bigr) \leq L(P,g) \leq L(P,h) \leq U(P,h) \leq U\bigl((P,P'),f\bigr) , and hence U(P,h) - L(P,h) \leq U\bigl((P,P'),f\bigr) - L\bigl((P,P'),f\bigr) . So if f is integrable so is h, and as L\bigl((P,P'),f\bigr) \leq L(P,h) \leq U\bigl((P,P'),f\bigr) we must have that \int_R h = \int_{R \times S} f.
We can also do the iterated integration in opposite order. The proof of this version is almost identical to version A, and we leave it as an exercise to the reader.
[mv:fubinivB] Let R \times S \subset {\mathbb{R}}^n \times {\mathbb{R}}^m be a closed rectangle and f \colon R \times S \to {\mathbb{R}} be integrable. The functions g \colon S \to {\mathbb{R}} and h \colon S \to {\mathbb{R}} defined by g(y) := \underline{\int_R} f^y \qquad \text{and} \qquad h(y) := \overline{\int_R} f^y are integrable over S and \int_S g = \int_S h = \int_{R \times S} f .
That is we also have \int_{R \times S} f = \int_S \left( \underline{\int_R} f(x,y) \, dx \right) \, dy = \int_S \left( \overline{\int_R} f(x,y) \, dx \right) \, dy .
Next suppose that f_x and f^y are integrable for simplicity. For example, suppose that f is continuous. Then by putting the two versions together we obtain the familiar \int_{R \times S} f = \int_R \int_S f(x,y) \, dy \, dx = \int_S \int_R f(x,y) \, dx \, dy .
Often the Fubini theorem is stated in two dimensions for a continuous function f \colon R \to {\mathbb{R}} on a rectangle R = [a,b] \times [c,d]. Then the Fubini theorem states that \int_R f = \int_a^b \int_c^d f(x,y) \,dy\,dx = \int_c^d \int_a^b f(x,y) \,dx\,dy . And the Fubini theorem is commonly thought of as the theorem that allows us to swap the order of iterated integrals.
Repeatedly applying Fubini theorem gets us the following corollary: Let R := [a_1,b_1] \times [a_2,b_2] \times \cdots \times [a_n,b_n] \subset {\mathbb{R}}^n be a closed rectangle and let f \colon R \to {\mathbb{R}} be continuous. Then \int_R f = \int_{a_1}^{b_1} \int_{a_2}^{b_2} \cdots \int_{a_n}^{b_n} f(x_1,x_2,\ldots,x_n) \, dx_n \, dx_{n-1} \cdots dx_1 .
Clearly we can also switch the order of integration to any order we please. We can also relax the continuity requirement by making sure that all the intermediate functions are integrable, or by using upper or lower integrals.
Exercises
Compute \int_{0}^1 \int_{-1}^1 xe^{xy} \, dx \, dy in a simple way.
Prove the assertion U\bigl((P,P'),f\bigr) \geq U(P,h) from the proof of .
Prove .
Let R=[a,b] \times [c,d] and f(x,y) is an integrable function on R such that such that for any fixed y, the function that takes x to f(x,y) is zero except at finitely many points. Show \int_R f = 0 .
Let R=[a,b] \times [c,d] and f(x,y) := g(x)h(y) for two continuous functions g \colon [a,b] \to {\mathbb{R}} and h \colon [a,b] \to {\mathbb{R}}. Prove \int_R f = \left(\int_a^b g\right)\left(\int_c^d h\right) .
Compute \int_0^1 \int_0^1 \frac{x^2-y^2}{{(x^2+y^2)}^2} \, dx \, dy \qquad \text{and} \qquad \int_0^1 \int_0^1 \frac{x^2-y^2}{{(x^2+y^2)}^2} \, dy \, dx . You will need to interpret the integrals as improper, that is, the limit of \int_\epsilon^1 as \epsilon \to 0.
Suppose f(x,y) := g(x) where g \colon [a,b] \to {\mathbb{R}} is Riemann integrable. Show that f is Riemann integrable for any R = [a,b] \times [c,d] and \int_R f = (d-c) \int_a^b g .
Define f \colon [-1,1] \times [0,1] \to {\mathbb{R}} by f(x,y) :=
\begin{cases}
x & \text{if $y \in {\mathbb{Q}}$,} \\
0 & \text{else.}
\end{cases} Show
a) \int_0^1 \int_{-1}^1 f(x,y) \, dx \, dy exists, but \int_{-1}^1 \int_0^1 f(x,y) \, dy \, dx does not.
b) Compute \int_{-1}^1 \overline{\int_0^1} f(x,y) \, dy \, dx and \int_{-1}^1 \underline{\int_0^1} f(x,y) \, dy \, dx.
c) Show f is not Riemann integrable on [-1,1] \times [0,1] (use Fubini).
Define f \colon [0,1] \times [0,1] \to {\mathbb{R}} by f(x,y) :=
\begin{cases}
\nicefrac{1}{q} & \text{if $x \in {\mathbb{Q}}$, $y \in {\mathbb{Q}}$, and $y=\nicefrac{p}{q}$ in lowest terms,} \\
0 & \text{else.}
\end{cases} Show
a) Show f is Riemann integrable on [0,1] \times [0,1].
b) Find \overline{\int_0^1} f(x,y) \, dx and \underline{\int_0^1} f(x,y) \, dx for all y \in [0,1], and show they are unequal for all y
\in {\mathbb{Q}}.
c) \int_0^1 \int_0^1 f(x,y) \, dy \, dx exists, but \int_0^1 \int_0^1 f(x,y) \, dx \, dy does not.
Note: By Fubini, \int_0^1 \overline{\int_0^1} f(x,y) \, dy \, dx and \int_0^1 \underline{\int_0^1} f(x,y) \, dy \, dx do exist and equal the integral of f on R.
Outer measure and null sets
Note: 2 lectures
Outer measure and null sets
Before we characterize all Riemann integrable functions, we need to make a slight detour. We introduce a way of measuring the size of sets in {\mathbb{R}}^n.
Let S \subset {\mathbb{R}}^n be a subset. Define the outer measure of S as m^*(S) := \inf\, \sum_{j=1}^\infty V(R_j) , where the infimum is taken over all sequences \{ R_j \} of open rectangles such that S \subset \bigcup_{j=1}^\infty R_j. In particular, S is of measure zero or a null set if m^*(S) = 0.
The theory of measures on {\mathbb{R}}^n is a very complicated subject. We will only require measure-zero sets and so we focus on these. The set S is of measure zero if for every \epsilon > 0 there exist a sequence of open rectangles \{ R_j \} such that \label{mv:eq:nullR} S \subset \bigcup_{j=1}^\infty R_j \qquad \text{and} \qquad \sum_{j=1}^\infty V(R_j) < \epsilon. Furthermore, if S is measure zero and S' \subset S, then S' is of measure zero. We can in fact use the same exact rectangles.
It is sometimes more convenient to use balls instead of rectangles. In fact we can choose balls no bigger than a fixed radius.
[mv:prop:ballsnull] Let \delta > 0 be given. A set S \subset {\mathbb{R}}^n is measure zero if and only if for every \epsilon > 0, there exists a sequence of open balls \{ B_j \}, where the radius of B_j is r_j < \delta such that S \subset \bigcup_{j=1}^\infty B_j \qquad \text{and} \qquad \sum_{j=1}^\infty r_j^n < \epsilon.
Note that the “volume” of B_j is proportional to r_j^n.
If R is a (closed or open) cube (rectangle with all sides equal) of side s, then R is contained in a closed ball of radius \sqrt{n}\, s by , and therefore in an open ball of size 2 \sqrt{n}\, s.
Let s be a number that is less than the smallest side of R and also so that 2\sqrt{n} \, s < \delta. We claim R is contained in a union of closed cubes C_1, C_2, \ldots, C_k of sides s such that \sum_{j=1}^k V(C_j) \leq 2^n V(R) . It is clearly true (without the 2^n) if R has sides that are integer multiples of s. So if a side is of length (\ell+\alpha) s, for \ell \in {\mathbb{N}} and 0 \leq \alpha < 1, then (\ell+\alpha)s \leq 2\ell s. Increasing the side to 2\ell s we obtain a new larger rectangle of volume at most 2^n times larger, but whose sides are multiples of s.
So suppose that there exist \{ R_j \} as in the definition such that [mv:eq:nullR] is true. As we have seen above, we can choose closed cubes \{ C_k \} with C_k of side s_k as above that cover all the rectangles \{ R_j \} and so that \sum_{k=1}^\infty s_k^n = \sum_{k=1}^\infty V(C_k) \leq 2^n \sum_{j=1}^\infty V(R_k) < 2^n \epsilon. Covering C_k with balls B_k of radius r_k = 2\sqrt{n} \, s_k we obtain \sum_{k=1}^\infty r_k^n < 2^{2n} n \epsilon . And as S \subset\bigcup_{j} R_j \subset \bigcup_{k} C_k \subset \bigcup_{k} B_k, we are finished.
Suppose we have the ball condition above for some \epsilon > 0. Without loss of generality assume that all r_j < 1. Each B_j is contained a in a cube R_j of side 2r_j. So V(R_j) = {(2 r_j)}^n < 2^n r_j. Therefore S \subset \bigcup_{j=1}^\infty R_j \qquad \text{and} \qquad \sum_{j=1}^\infty V(R_j) < \sum_{j=1}^\infty 2^n r_j < 2^n \epsilon. \qedhere
The definition of outer measure could have been done with open balls as well, not just null sets. We leave this generalization to the reader.
Examples and basic properties
The set {\mathbb{Q}}^n \subset {\mathbb{R}}^n of points with rational coordinates is a set of measure zero.
Proof: The set {\mathbb{Q}}^n is countable and therefore let us write it as a sequence q_1,q_2,\ldots. For each q_j find an open rectangle R_j with q_j \in R_j and V(R_j) < \epsilon 2^{-j}. Then {\mathbb{Q}}^n \subset \bigcup_{j=1}^\infty R_j \qquad \text{and} \qquad \sum_{j=1}^\infty V(R_j) < \sum_{j=1}^\infty \epsilon 2^{-j} = \epsilon .
The example points to a more general result.
A countable union of measure zero sets is of measure zero.
Suppose S = \bigcup_{j=1}^\infty S_j , where S_j are all measure zero sets. Let \epsilon > 0 be given. For each j there exists a sequence of open rectangles \{ R_{j,k} \}_{k=1}^\infty such that S_j \subset \bigcup_{k=1}^\infty R_{j,k} and \sum_{k=1}^\infty V(R_{j,k}) < 2^{-j} \epsilon . Then S \subset \bigcup_{j=1}^\infty \bigcup_{k=1}^\infty R_{j,k} . As V(R_{j,k}) is always positive, the sum over all j and k can be done in any order. In particular, it can be done as \sum_{j=1}^\infty \sum_{k=1}^\infty V(R_{j,k}) < \sum_{j=1}^\infty 2^{-j} \epsilon = \epsilon . \qedhere
The next example is not just interesting, it will be useful later.
[mv:example:planenull] Let P := \{ x \in {\mathbb{R}}^n : x_k = c \} for a fixed k=1,2,\ldots,n and a fixed constant c \in {\mathbb{R}}. Then P is of measure zero.
Proof: First fix s and let us prove that P_s := \{ x \in {\mathbb{R}}^n : x_k = c, \left\lvert {x_j} \right\rvert \leq s \text{ for all $j\not=k$} \} is of measure zero. Given any \epsilon > 0 define the open rectangle R := \{ x \in {\mathbb{R}}^n : c-\epsilon < x_k < c+\epsilon, \left\lvert {x_j} \right\rvert < s+1 \text{ for all $j\not=k$} \} . It is clear that P_s \subset R. Furthermore V(R) = 2\epsilon {\bigl(2(s+1)\bigr)}^{n-1} . As s is fixed, we can make V(R) arbitrarily small by picking \epsilon small enough.
Next we note that P = \bigcup_{j=1}^\infty P_j and a countable union of measure zero sets is measure zero.
If a < b, then m^*([a,b]) = b-a.
Proof: In the case of {\mathbb{R}}, open rectangles are open intervals. Since [a,b] \subset (a-\epsilon,b+\epsilon) for all \epsilon > 0. Hence, m^*([a,b]) \leq b-a.
Let us prove the other inequality. Suppose \{ (a_j,b_j) \} are open intervals such that [a,b] \subset \bigcup_{j=1}^\infty (a_j,b_j) . We wish to bound \sum (b_j-a_j) from below. Since [a,b] is compact, then there are only finitely many open intervals that still cover [a,b]. As throwing out some of the intervals only makes the sum smaller, we only need to take the finite number of intervals still covering [a,b]. If (a_i,b_i) \subset (a_j,b_j), then we can throw out (a_i,b_i) as well. Therefore we have [a,b] \subset \bigcup_{j=1}^k (a_j,b_j) for some k, and we assume that the intervals are sorted such that a_1 < a_2 < \cdots < a_k. Note that since (a_2,b_2) is not contained in (a_1,b_1) we have that a_1 < a_2 < b_1 < b_2. Similarly a_j < a_{j+1} < b_j < b_{j+1}. Furthermore, a_1 < a and b_k > b. Thus, m^*([a,b]) \geq \sum_{j=1}^k (b_j-a_j) \geq \sum_{j=1}^{k-1} (a_{j+1}-a_j) + (b_k-a_k) = b_k-a_1 > b-a .
[mv:prop:compactnull] Suppose E \subset {\mathbb{R}}^n is a compact set of measure zero. Then for every \epsilon > 0, there exist finitely many open rectangles R_1,R_2,\ldots,R_k such that E \subset R_1 \cup R_2 \cup \cdots \cup R_k \qquad \text{and} \qquad \sum_{j=1}^k V(R_j) < \epsilon. Also for any \delta > 0, there exist finitely many open balls B_1,B_2,\ldots,B_k of radii r_1,r_2,\ldots,r_k < \delta such that E \subset B_1 \cup B_2 \cup \cdots \cup B_k \qquad \text{and} \qquad \sum_{j=1}^k r_j^n < \epsilon.
Find a sequence of open rectangles \{ R_j \} such that E \subset \bigcup_{j=1}^\infty R_j \qquad \text{and} \qquad \sum_{j=1}^\infty V(R_j) < \epsilon. By compactness, there are finitely many of these rectangles that still contain E. That is, there is some k such that E \subset R_1 \cup R_2 \cup \cdots \cup R_k. Hence \sum_{j=1}^k V(R_j) \leq \sum_{j=1}^\infty V(R_j) < \epsilon.
The proof that we can choose balls instead of rectangles is left as an exercise.
[example:cantor] So that the reader is not under the impression that there are only very few measure zero sets and that these are simple, let us give an uncountable, compact, measure zero subset in [0,1]. For any x \in [0,1] write the representation in ternary notation x = \sum_{j=1}^\infty d_n 3^{-n} . See §1.5 in volume I, in particular Exercise 1.5.4. Define the Cantor set C as C := \Bigl\{ x \in [0,1] : x = \sum_{j=1}^\infty d_n 3^{-n}, \text{ where $d_j = 0$ or $d_j = 2$ for all $j$} \Bigr\} . That is, x is in C if it has a ternary expansion in only 0’s and 2’s. If x has two expansions, as long as one of them does not have any 1’s, then x is in C. Define C_0 := [0,1] and C_k := \Bigl\{ x \in [0,1] : x = \sum_{j=1}^\infty d_n 3^{-n}, \text{ where $d_j = 0$ or $d_j = 2$ for all $j=1,2,\ldots,k$} \Bigr\} . Clearly, C = \bigcap_{k=1}^\infty C_k . We leave as an exercise to prove that:
- Each C_k is a finite union of closed intervals. It is obtained by taking C_{k-1}, and from each closed interval removing the “middle third”.
- Therefore, each C_k is closed.
- Furthermore, m^*(C_k) =1 - \sum_{n=1}^k \frac{2^n}{3^{n+1}}.
- Hence, m^*(C) = 0.
- The set C is in one to one correspondence with [0,1], in other words, uncountable.
See .
Images of null sets
Before we look at images of measure zero sets, let us see what a continuously differentiable function does to a ball.
[lemma:ballmapder] Suppose U \subset {\mathbb{R}}^n is an open set, B \subset U is an open or closed ball of radius at most r, f \colon B \to {\mathbb{R}}^n is continuously differentiable and suppose \lVert {f'(x)} \rVert \leq M for all x \in B. Then f(B) \subset B', where B' is a ball of radius at most Mr.
Without loss of generality assume B is a closed ball. The ball B is convex, and hence via , that \lVert {f(x)-f(y)} \rVert \leq M \lVert {x-y} \rVert for all x,y in B. In particular, suppose B = C(y,r), then f(B) \subset C\bigl(f(y),M r \bigr).
The image of a measure zero set using a continuous map is not necessarily a measure zero set. However if we assume the mapping is continuously differentiable, then the mapping cannot “stretch” the set too much.
[prop:imagenull] Suppose U \subset {\mathbb{R}}^n is an open set and f \colon U \to {\mathbb{R}}^n is a continuously differentiable mapping. If E \subset U is a measure zero set, then f(E) is measure zero.
We leave the proof for a general measure zero set as an exercise, and we now prove the proposition for a compact measure zero set. Therefore let us suppose E is compact.
First let us replace U by a smaller open set to make \lVert {f'(x)} \rVert bounded. At each point x \in E pick an open ball B(x,r_x) such that the closed ball C(x,r_x) \subset U. By compactness we only need to take finitely many points x_1,x_2,\ldots,x_q to still cover E. Define U' := \bigcup_{j=1}^q B(x_j,r_{x_j}), \qquad K := \bigcup_{j=1}^q C(x_j,r_{x_j}). We have E \subset U' \subset K \subset U. The set K is compact. The function that takes x to \lVert {f'(x)} \rVert is continuous, and therefore there exists an M > 0 such that \lVert {f'(x)} \rVert \leq M for all x \in K. So without loss of generality we may replace U by U' and from now on suppose that \lVert {f'(x)} \rVert \leq M for all x \in U.
At each point x \in E pick a ball B(x,\delta_x) of maximum radius so that B(x,\delta_x) \subset U. Let \delta = \inf_{x\in E} \delta_x. Take a sequence \{ x_j \} \subset E so that \delta_{x_j} \to \delta. As E is compact, we can pick the sequence to be convergent to some y \in E. Once \lVert {x_j-y} \rVert < \frac{\delta_y}{2}, then \delta_{x_j} > \frac{\delta_y}{2} by the triangle inequality. Therefore \delta > 0.
Given \epsilon > 0, there exist balls B_1,B_2,\ldots,B_k of radii r_1,r_2,\ldots,r_k < \delta such that E \subset B_1 \cup B_2 \cup \cdots \cup B_k \qquad \text{and} \qquad \sum_{j=1}^k r_j^n < \epsilon. Suppose B_1', B_2', \ldots, B_k' are the balls of radius Mr_1, Mr_2, \ldots, Mr_k from , such that f(B_j) \subset B_j' for all j. f(E) \subset f(B_1) \cup f(B_2) \cup \cdots \cup f(B_k) \subset B_1' \cup B_2' \cup \cdots \cup B_k' \qquad \text{and} \qquad \sum_{j=1}^k Mr_j^n %= %M %\sum_{j=1}^k Mr_j^n < M \epsilon. \qedhere
Exercises
Finish the proof of , that is, show that you can use balls instead of rectangles.
If A \subset B, then m^*(A) \leq m^*(B).
Suppose X \subset {\mathbb{R}}^n is a set such that for every \epsilon > 0 there exists a set Y such that X \subset Y and m^*(Y) \leq \epsilon. Prove that X is a measure zero set.
Show that if R \subset {\mathbb{R}}^n is a closed rectangle, then m^*(R) = V(R).
The closure of a measure zero set can be quite large. Find an example set S \subset {\mathbb{R}}^n that is of measure zero, but whose closure \overline{S} = {\mathbb{R}}^n.
Prove the general case of without using compactness:
a) Mimic the proof to first prove that the proposition holds if E is relatively compact; a set E \subset U is relatively compact if the closure of E in the subspace topology on U is compact, or in other words if there exists a compact set K with K \subset U and E \subset K.
Hint: The bound on the size of the derivative still holds, but you need to use countably many balls in the second part of the proof. Be careful as the closure of E need no longer be measure zero.
b) Now prove it for any null set E.
Hint: First show that \{ x \in U : d(x,y) \geq
\nicefrac{1}{m} \text{ for all\)y U\(and } d(0,x) \leq m \} is a compact set for any m > 0.
Let U \subset {\mathbb{R}}^n be an open set and let f \colon U \to {\mathbb{R}} be a continuously differentiable function. Let G := \{ (x,y) \in U \times {\mathbb{R}}: y = f(x) \} be the graph of f. Show that f is of measure zero.
Given a closed rectangle R \subset {\mathbb{R}}^n, show that for any \epsilon > 0 there exists a number s > 0 and finitely many open cubes C_1,C_2,\ldots,C_k of side s such that R \subset C_1 \cup C_2 \cup \cdots \cup C_k and \sum_{j=1}^k V(C_j) \leq V(R) + \epsilon .
Show that there exists a number k = k(n,r,\delta) depending only on n, r and \delta such the following holds. Given B(x,r) \subset {\mathbb{R}}^n and \delta > 0, there exist k open balls B_1,B_2,\ldots,B_k of radius at most \delta such that B(x,r) \subset B_1 \cup B_2 \cup \cdots \cup B_k. Note that you can find k that really only depends on n and the ratio \nicefrac{\delta}{r}.
Prove the statements of . That is, prove:
a) Each C_k is a finite union of closed intervals, and so C is closed.
b) m^*(C_k) =1 - \sum_{n=1}^k \frac{2^n}{3^{n+1}}.
c) m^*(C) = 0.
d) The set C is in one to one correspondence with [0,1].
The set of Riemann integrable functions
Note: 1 lecture
Oscillation and continuity
Let S \subset {\mathbb{R}}^n be a set and f \colon S \to {\mathbb{R}} a function. Instead of just saying that f is or is not continuous at a point x \in S, we need to be able to quantify how discontinuous f is at a function is at x. For any \delta > 0 define the oscillation of f on the \delta-ball in subset topology that is B_S(x,\delta) = B_{{\mathbb{R}}^n}(x,\delta) \cap S as o(f,x,\delta) := {\sup_{y \in B_S(x,\delta)} f(y)} - {\inf_{y \in B_S(x,\delta)} f(y)} = \sup_{y_1,y_2 \in B_S(x,\delta)} \bigl(f(y_1)-f(y_2)\bigr) . That is, o(f,x,\delta) is the length of the smallest interval that contains the image f\bigl(B_S(x,\delta)\bigr). Clearly o(f,x,\delta) \geq 0 and notice o(f,x,\delta) \leq o(f,x,\delta') whenever \delta < \delta'. Therefore, the limit as \delta \to 0 from the right exists and we define the oscillation of a function f at x as o(f,x) := \lim_{\delta \to 0^+} o(f,x,\delta) = \inf_{\delta > 0} o(f,x,\delta) .
f \colon S \to {\mathbb{R}} is continuous at x \in S if and only if o(f,x) = 0.
First suppose that f is continuous at x \in S. Then given any \epsilon > 0, there exists a \delta > 0 such that for y \in B_S(x,\delta) we have \left\lvert {f(x)-f(y)} \right\rvert < \epsilon. Therefore if y_1,y_2 \in B_S(x,\delta), then f(y_1)-f(y_2) = f(y_1)-f(x)-\bigl(f(y_2)-f(x)\bigr) < \epsilon + \epsilon = 2 \epsilon . We take the supremum over y_1 and y_2 o(f,x,\delta) = \sup_{y_1,y_2 \in B_S(x,\delta)} \bigl(f(y_1)-f(y_2)\bigr) \leq 2 \epsilon . Hence, o(x,f) = 0.
On the other hand suppose that o(x,f) = 0. Given any \epsilon > 0, find a \delta > 0 such that o(f,x,\delta) < \epsilon. If y \in B_S(x,\delta), then \left\lvert {f(x)-f(y)} \right\rvert \leq \sup_{y_1,y_2 \in B_S(x,\delta)} \bigl(f(y_1)-f(y_2)\bigr) = o(f,x,\delta) < \epsilon. \qedhere
[prop:seclosed] Let S \subset {\mathbb{R}}^n be closed, f \colon S \to {\mathbb{R}}, and \epsilon > 0. The set \{ x \in S : o(f,x) \geq \epsilon \} is closed.
Equivalently we want to show that G = \{ x \in S : o(f,x) < \epsilon \} is open in the subset topology. As \inf_{\delta > 0} o(f,x,\delta) < \epsilon, find a \delta > 0 such that o(f,x,\delta) < \epsilon Take any \xi \in B_S(x,\nicefrac{\delta}{2}). Notice that B_S(\xi,\nicefrac{\delta}{2}) \subset B_S(x,\delta). Therefore, o(f,\xi,\nicefrac{\delta}{2}) = \sup_{y_1,y_2 \in B_S(\xi,\nicefrac{\delta}{2})} \bigl(f(y_1)-f(y_2)\bigr) \leq \sup_{y_1,y_2 \in B_S(x,\delta)} \bigl(f(y_1)-f(y_2)\bigr) = o(f,x,\delta) < \epsilon . So o(f,\xi) < \epsilon as well. As this is true for all \xi \in B_S(x,\nicefrac{\delta}{2}) we get that G is open in the subset topology and S \setminus G is closed as is claimed.
The set of Riemann integrable functions
We have seen that continuous functions are Riemann integrable, but we also know that certain kinds of discontinuities are allowed. It turns out that as long as the discontinuities happen on a set of measure zero, the function is integrable and vice versa.
Let R \subset {\mathbb{R}}^n be a closed rectangle and f \colon R \to {\mathbb{R}} a bounded function. Then f is Riemann integrable if and only if the set of discontinuities of f is of measure zero (a null set).
Let S \subset R be the set of discontinuities of f. That is S = \{ x \in R : o(f,x) > 0 \}. The trick to this proof is to isolate the bad set into a small set of subrectangles of a partition. There are only finitely many subrectangles of a partition, so we will wish to use compactness. If S is closed, then it would be compact and we could cover it by small rectangles as it is of measure zero. Unfortunately, in general S is not closed so we need to work a little harder.
For every \epsilon > 0, define S_\epsilon := \{ x \in R : o(f,x) \geq \epsilon \} . By S_\epsilon is closed and as it is a subset of R, which is bounded, S_\epsilon is compact. Furthermore, S_\epsilon \subset S and S is of measure zero. Via there are finitely many open rectangles O_1,O_2,\ldots,O_k that cover S_\epsilon and \sum V(O_j) < \epsilon.
The set T = R \setminus ( O_1 \cup \cdots \cup O_k ) is closed, bounded, and therefore compact. Furthermore for x \in T, we have o(f,x) < \epsilon. Hence for each x \in T, there exists a small closed rectangle T_x with x in the interior of T_x, such that \sup_{y\in T_x} f(y) - \inf_{y\in T_x} f(y) < 2\epsilon. The interiors of the rectangles T_x cover T. As T is compact there exist finitely many such rectangles T_1, T_2, \ldots, T_m that cover T.
Take the rectangles T_1,T_2,\ldots,T_m and O_1,O_2,\ldots,O_k and construct a partition out of their endpoints. That is construct a partition P of R with subrectangles R_1,R_2,\ldots,R_p such that every R_j is contained in T_\ell for some \ell or the closure of O_\ell for some \ell. Order the rectangles so that R_1,R_2,\ldots,R_q are those that are contained in some T_\ell, and R_{q+1},R_{q+2},\ldots,R_{p} are the rest. In particular, \sum_{j=1}^q V(R_j) \leq V(R) \qquad \text{and} \qquad \sum_{j=q+1}^p V(R_j) \leq \epsilon . Let m_j and M_j be the inf and sup of f over R_j as before. If R_j \subset T_\ell for some \ell, then (M_j-m_j) < 2 \epsilon. Let B \in {\mathbb{R}} be such that \left\lvert {f(x)} \right\rvert \leq B for all x \in R, so (M_j-m_j) < 2B over all rectangles. Then \begin{split} U(P,f)-L(P,f) & = \sum_{j=1}^p (M_j-m_j) V(R_j) \\ & = \left( \sum_{j=1}^q (M_j-m_j) V(R_j) \right) + \left( \sum_{j=q+1}^p (M_j-m_j) V(R_j) \right) \\ & \leq \left( \sum_{j=1}^q 2\epsilon V(R_j) \right) + \left( \sum_{j=q+1}^p 2 B V(R_j) \right) \\ & \leq 2 \epsilon V(R) + 2B \epsilon = \epsilon \bigl(2V(R)+2B\bigr) . \end{split} Clearly, we can make the right hand side as small as we want and hence f is integrable.
For the other direction, suppose f is Riemann integrable over R. Let S be the set of discontinuities again and now let S_k := \{ x \in R : o(f,x) \geq \nicefrac{1}{k} \}. Fix a k \in {\mathbb{N}}. Given an \epsilon > 0, find a partition P with subrectangles R_1,R_2,\ldots,R_p such that U(P,f)-L(P,f) = \sum_{j=1}^p (M_j-m_j) V(R_j) < \epsilon Suppose R_1,R_2,\ldots,R_p are ordered so that the interiors of R_1,R_2,\ldots,R_{q} intersect S_k, while the interiors of R_{q+1},R_{q+2},\ldots,R_p are disjoint from S_k. If x \in R_j \cap S_k and x is in the interior of R_j so sufficiently small balls are completely inside R_j, then by definition of S_k we have M_j-m_j \geq \nicefrac{1}{k}. Then \epsilon > \sum_{j=1}^p (M_j-m_j) V(R_j) \geq \sum_{j=1}^q (M_j-m_j) V(R_j) \geq \frac{1}{k} \sum_{j=1}^q V(R_j) In other words \sum_{j=1}^q V(R_j) < k \epsilon. Let G be the set of all boundaries of all the subrectangles of P. The set G is of measure zero (see ). Let R_j^\circ denote the interior of R_j, then S_k \subset R_1^\circ \cup R_2^\circ \cup \cdots \cup R_q^\circ \cup G . As G can be covered by open rectangles arbitrarily small volume, S_k must be of measure zero. As S = \bigcup_{k=1}^\infty S_k and a countable union of measure zero sets is of measure zero, S is of measure zero.
Exercises
Suppose f \colon (a,b) \times (c,d) \to {\mathbb{R}} is a bounded continuous function. Show that the integral of f over R = [a,b] \times [c,d] makes sense and is uniquely defined. That is, set f to be anything on the boundary of R and compute the integral.
Suppose R \subset {\mathbb{R}}^n is a closed rectangle. Show that {\mathcal{R}}(R), the set of Riemann integrable functions, is an algebra. That is, show that if f,g \in {\mathcal{R}}(R) and a \in {\mathbb{R}}, then af \in {\mathcal{R}}(R), f+g \in {\mathcal{R}}(R) and fg \in {\mathcal{R}}(R).
Suppose R \subset {\mathbb{R}}^n is a closed rectangle and f \colon R \to {\mathbb{R}} is a bounded function which is zero except on a closed set E \subset R of measure zero. Show that \int_R f exists and compute it.
Suppose R \subset {\mathbb{R}}^n is a closed rectangle and f \colon R \to {\mathbb{R}} and g \colon R \to {\mathbb{R}} are two Riemann integrable functions. Suppose f = g except for a closed set E \subset R of measure zero. Show that \int_R f = \int_R g.
Suppose R \subset {\mathbb{R}}^n is a closed rectangle and f \colon R \to {\mathbb{R}} is a bounded function.
a) Suppose there exists a closed set E \subset R of measure zero such that f|_{R\setminus E} is continuous. Then f \in {\mathcal{R}}(R).
b) Find am example where E \subset R is a set of measure zero (but not closed) such that f|_{R\setminus E} is continuous and f \not\in {\mathcal{R}}(R).
Jordan measurable sets
Note: 1 lecture
Volume and Jordan measurable sets
Given a bounded set S \subset {\mathbb{R}}^n its characteristic function or indicator function is \chi_S(x) := \begin{cases} 1 & \text{ if $x \in S$}, \\ 0 & \text{ if $x \notin S$}. \end{cases} A bounded set S is Jordan measurable if for some closed rectangle R such that S \subset R, the function \chi_S is in {\mathcal{R}}(R). Take two closed rectangles R and R' with S \subset R and S \subset R', then R \cap R' is a closed rectangle also containing S. By and , \chi_S \in {\mathcal{R}}(R \cap R') and so \chi_S \in {\mathcal{R}}(R'). Thus \int_R \chi_S = \int_{R'} \chi_S = \int_{R \cap R'} \chi_S. We define the n-dimensional volume of the bounded Jordan measurable set S as V(S) := \int_R \chi_S , where R is any closed rectangle containing S.
A bounded set S \subset {\mathbb{R}}^n is Jordan measurable if and only if the boundary \partial S is a measure zero set.
Suppose R is a closed rectangle such that S is contained in the interior of R. If x \in \partial S, then for every \delta > 0, the sets S \cap B(x,\delta) (where \chi_S is 1) and the sets (R \setminus S) \cap B(x,\delta) (where \chi_S is 0) are both nonempty. So \chi_S is not continuous at x. If x is either in the interior of S or in the complement of the closure \overline{S}, then \chi_S is either identically 1 or identically 0 in a whole neighborhood of x and hence \chi_S is continuous at x. Therefore, the set of discontinuities of \chi_S is precisely the boundary \partial S. The proposition then follows.
[prop:jordanmeas] Suppose S and T are bounded Jordan measurable sets. Then
- The closure \overline{S} is Jordan measurable.
- The interior S^\circ is Jordan measurable.
- S \cup T is Jordan measurable.
- S \cap T is Jordan measurable.
- S \setminus T is Jordan measurable.
The proof of the proposition is left as an exercise. Next, we find that the volume that we defined above coincides with the outer measure we defined above.
If S \subset {\mathbb{R}}^n is Jordan measurable, then V(S) = m^*(S).
Given \epsilon > 0, let R be a closed rectangle that contains S. Let P be a partition of R such that U(P,\chi_S) \leq \int_R \chi_S + \epsilon = V(S) + \epsilon \qquad \text{and} \qquad L(P,\chi_S) \geq \int_R \chi_S - \epsilon = V(S)-\epsilon. Let R_1,\ldots,R_k be all the subrectangles of P such that \chi_S is not identically zero on each R_j. That is, there is some point x \in R_j such that x \in S. Let O_j be an open rectangle such that R_j \subset O_j and V(O_j) < V(R_j) + \nicefrac{\epsilon}{k}. Notice that S \subset \bigcup_j O_j. Then U(P,\chi_S) = \sum_{j=1}^k V(R_j) > \left(\sum_{j=1}^k V(O_j)\right) - \epsilon \geq m^*(S) - \epsilon . As U(P,\chi_S) \leq V(S) + \epsilon, then m^*(S) - \epsilon \leq V(S) + \epsilon, or in other words m^*(S) \leq V(S).
Let R'_1,\ldots,R'_\ell be all the subrectangles of P such that \chi_S is identically one on each R'_j. In other words, these are the subrectangles contained in S. The interiors of the subrectangles R'^\circ_j are disjoint and V(R'^\circ_j) = V(R'_j). It is easy to see from definition that m^*\Bigl(\bigcup_{j=1}^\ell R'^\circ_j\Bigr) = \sum_{j=1}^\ell V(R'^\circ_j) . Hence m^*(S) \geq m^*\Bigl(\bigcup_{j=1}^\ell R'_j\Bigr) \geq m^*\Bigl(\bigcup_{j=1}^\ell R'^\circ_j\Bigr) %= %\sum_{j=1}^\ell %m^*(R'^\circ_j) = \sum_{j=1}^\ell V(R'^\circ_j) = \sum_{j=1}^\ell V(R'_j) = L(P,f) \geq V(S) - \epsilon . Therefore m^*(S) \geq V(S) as well.
Integration over Jordan measurable sets
In one variable there is really only one type of reasonable set to integrate over: an interval. In several variables we have many common types of sets we might want to integrate over and these are not described so easily.
Let S \subset {\mathbb{R}}^n be a bounded Jordan measurable set. A bounded function f \colon S \to {\mathbb{R}} is said to be Riemann integrable on S, or f \in {\mathcal{R}}(S), if for a closed rectangle R such that S \subset R, the function \widetilde{f} \colon R \to {\mathbb{R}} defined by \widetilde{f}(x) = \begin{cases} f(x) & \text{ if $x \in S$}, \\ 0 & \text{ otherwise}, \end{cases} is in {\mathcal{R}}(R). In this case we write \int_S f := \int_R \widetilde{f}.
When f is defined on a larger set and we wish to integrate over S, then we apply the definition to the restriction f|_S. In particular, if f \colon R \to {\mathbb{R}} for a closed rectangle R, and S \subset R is a Jordan measurable subset, then \int_S f = \int_R f \chi_S .
If S \subset {\mathbb{R}}^n is a Jordan measurable set and f \colon S \to {\mathbb{R}} is a bounded continuous function, then f is integrable on S.
Define the function \widetilde{f} as above for some closed rectangle R with S \subset R. If x \in R \setminus \overline{S}, then \widetilde{f} is identically zero in a neighborhood of x. Similarly if x is in the interior of S, then \widetilde{f} = f on a neighborhood of x and f is continuous at x. Therefore, \widetilde{f} is only ever possibly discontinuous at \partial S, which is a set of measure zero, and we are finished.
Images of Jordan measurable subsets
Finally, images of Jordan measurable sets are Jordan measurable under nice enough mappings. For simplicity, let us assume that the Jacobian never vanishes.
Suppose S \subset {\mathbb{R}}^n is a closed bounded Jordan measurable set, and S \subset U for an open set U \subset {\mathbb{R}}^n. Suppose g \colon U \to {\mathbb{R}}^n is a one-to-one continuously differentiable mapping such that J_g is never zero on S. Then g(S) is Jordan measurable.
Let T = g(S). We claim that the boundary \partial T is contained in the set g(\partial S). Suppose the claim is proved. As S is Jordan measurable, then \partial S is measure zero. Then g(\partial S) is measure zero by . As \partial T \subset g(\partial S), then T is Jordan measurable.
It is therefore left to prove the claim. First, S is closed and bounded and hence compact. By Lemma 7.5.4 from volume I, T = g(S) is also compact and therefore closed. In particular, \partial T \subset T. Suppose y \in \partial T, then there must exist an x \in S such that g(x) = y, and by hypothesis J_g(x) \not= 0.
We now use the inverse function theorem . We find a neighborhood V \subset U of x and an open set W such that the restriction f|_V is a one-to-one and onto function from V to W with a continuously differentiable inverse. In particular, g(x) = y \in W. As y \in \partial T, there exists a sequence \{ y_k \} in W with \lim y_k = y and y_k \notin T. As g|_V is invertible and in particular has a continuous inverse, there exists a sequence \{ x_k \} in V such that g(x_k) = y_k and \lim x_k = x. Since y_k \notin T = g(S), clearly x_k \notin S. Since x \in S, we conclude that x \in \partial S. The claim is proved, \partial T \subset g(\partial S).
Exercises
Prove .
Prove that a bounded convex set is Jordan measurable. Hint: induction on dimension.
[exercise:intovertypeIset] Let f \colon [a,b] \to {\mathbb{R}} and g \colon [a,b] \to {\mathbb{R}} be continuous functions and such that for all x \in (a,b), f(x) < g(x). Let U := \{ (x,y) \in {\mathbb{R}}^2 : a < x < b \text{ and } f(x) < y < g(x) \} . a) Show that U is Jordan measurable.
b) If f \colon U \to {\mathbb{R}} is Riemann integrable on U, then \int_U f =
\int_a^b \int_{g(x)}^{f(x)} f(x,y) \, dy \, dx .
Let us construct an example of a non-Jordan measurable open set. For simplicity we work first in one dimension. Let \{ r_j \} be an enumeration of all rational numbers in (0,1). Let (a_j,b_j) be open intervals such that (a_j,b_j) \subset (0,1) for all j, r_j \in (a_j,b_j), and \sum_{j=1}^\infty (b_j-a_j) < \nicefrac{1}{2}. Now let U :=
\bigcup_{j=1}^\infty (a_j,b_j). Show that
a) The open intervals (a_j,b_j) as above actually exist.
b) \partial U = [0,1] \setminus U.
c) \partial U is not of measure zero, and therefore U is not Jordan measurable.
d) Show that W := \bigl( (0,1) \times (0,2) \bigr) \setminus \bigl( U
\times [0,1] \bigr) \subset {\mathbb{R}}^2 is a connected bounded open set in {\mathbb{R}}^2 that is not Jordan measurable.
Green’s theorem
Note: 1 lecture
One of the most important theorems of analysis in several variables is the so-called generalized Stokes’ theorem, a generalization of the fundamental theorem of calculus. Perhaps the most often used version is the version in two dimensions, called Green’s theorem, which we prove here.
Let U \subset {\mathbb{R}}^2 be a bounded connected open set. Suppose the boundary \partial U is a finite union of (the images of) simple piecewise smooth paths such that near each point p \in \partial U every neighborhood V of p contains points of {\mathbb{R}}^2 \setminus \overline{U}. Then U is called a bounded domain with piecewise smooth boundary in {\mathbb{R}}^2.
The condition about points outside the closure means that locally \partial U separates {\mathbb{R}}^2 into “inside” and “outside”. The condition prevents \partial U from being just a “cut” inside U. Therefore as we travel along the path in a certain orientation, there is a well defined left and a right, and either it is U on the left and the complement of U on the right, or vice-versa. Thus by orientation on U we mean the direction along which we travel along the paths. It is easy to switch orientation if needed by reparametrizing the path.
If U \subset {\mathbb{R}}^2 is a bounded domain with piecewise smooth boundary, let \partial U be oriented and \gamma \colon [a,b] \to {\mathbb{R}}^2 is a parametrization of \partial U giving the orientation. Write \gamma(t) = \big(x(t),y(t)\bigr). If the vector n(t) := \bigl(-y'(t),x'(t)\bigr) points into the domain, that is, \epsilon n(t) + \gamma(t) is in U for all small enough \epsilon > 0, then \partial U is positively oriented. Otherwise it is negatively oriented.
The vector n(t) turns \gamma^{\:\prime}(t) counterclockwise by 90^\circ, that is to the left. A boundary is positively oriented, if when we travel along the boundary in the direction of its orientation, the domain is “on our left”. For example, if U is a bounded domain with “no holes”, that is \partial U is connected, then the positive orientation means we are travelling counterclockwise around \partial U. If we do have “holes”, then we travel around them clockwise.
Let U \subset {\mathbb{R}}^2 be a bounded domain with piecewise smooth boundary, then U is Jordan measurable.
We need that \partial U is of measure zero. As \partial U is a finite union of simple piecewise smooth paths, which themselves are finite unions of smooth paths we need only show that a smooth path is of measure zero in {\mathbb{R}}^2.
Let \gamma \colon [a,b] \to {\mathbb{R}}^2 be a smooth path. It is enough to show that \gamma\bigl((a,b)\bigr) is of measure zero, as adding two points, that is the points \gamma(a) and \gamma(b), to a measure zero set still results in a measure zero set. Define f \colon (a,b) \times (-1,1) \to {\mathbb{R}}^2, \qquad \text{as} \qquad f(x,y) := \gamma(x) . The set (a,b) \times \{ 0 \} is of measure zero in {\mathbb{R}}^2 and \gamma\bigl((a,b)\bigr) = f\bigl( (a,b) \times \{ 0 \} \bigr). Hence by , \gamma\bigl((a,b)\bigr) is measure zero in {\mathbb{R}}^2 and so \gamma\bigl([a,b]\bigr) is also measure zero, and so finally \partial U is also measure zero.
Suppose U \subset {\mathbb{R}}^2 is a bounded domain with piecewise smooth boundary with the boundary positively oriented. Suppose P and Q are continuously differentiable functions defined on some open set that contains the closure \overline{U}. Then \int_{\partial U} P \, dx + Q\, dy = \int_{U} \left(\frac{\partial Q}{\partial x} - \frac{\partial P}{\partial y} \right) . %dx dy .
We stated Green’s theorem in general, although we will only prove a special version of it. That is, we will only prove it for a special kind of domain. The general version follows from the special case by application of further geometry, and cutting up the general domain into smaller domains on which to apply the special case. We will not prove the general case.
Let U \subset {\mathbb{R}}^2 be a domain with piecewise smooth boundary. We say U is of type I if there exist numbers a < b, and continuous functions f \colon [a,b] \to {\mathbb{R}} and g \colon [a,b] \to {\mathbb{R}}, such that U := \{ (x,y) \in {\mathbb{R}}^2 : a < x < b \text{ and } f(x) < y < g(x) \} . Similarly, U is of type II if there exist numbers c < d, and continuous functions h \colon [c,d] \to {\mathbb{R}} and k \colon [c,d] \to {\mathbb{R}}, such that U := \{ (x,y) \in {\mathbb{R}}^2 : c < y < d \text{ and } h(y) < x < k(y) \} . Finally, U \subset {\mathbb{R}}^2 is of type III if it is both of type I and type II.
We will only prove Green’s theorem for type III domains.
Let f,g,h,k be the functions defined above. By , U is Jordan measurable and as U is of type I, then \begin{split} \int_U \left(- \frac{\partial P}{\partial y} \right) & = \int_a^b \int_{g(x)}^{f(x)} \left(- \frac{\partial P}{\partial y} (x,y) \right) \, dy \, dx \\ & = \int_a^b \Bigl( - P\bigl(x,f(x)\bigr) + P\bigl(x,g(x)\bigr) \Bigr) \, dx \\ & = \int_a^b P\bigl(x,g(x)\bigr) \, dx - \int_a^b P\bigl(x,f(x)\bigr) \, dx . \end{split} Now we wish to integrate P\,dx along the boundary. The one-form P\,dx integrates to zero when integrating along the straight vertical lines in the boundary. Therefore it only is integrated along the top and along the bottom. As a parameter, x runs from left to right. If we use the parametrizations that take x to \bigl(x,f(x)\bigr) and to \bigl(x,g(x)\bigr) we recognize path integrals above. However the second path integral is in the wrong direction, the top should be going right to left, and so we must switch orientation. \int_{\partial U} P \, dx = \int_a^b P\bigl(x,g(x)\bigr) \, dx + \int_b^a P\bigl(x,f(x)\bigr) \, dx = \int_U \left(- \frac{\partial P}{\partial y} \right) .
Similarly, U is also of type II. The form Q\,dy integrates to zero along horizontal lines. So \int_U \frac{\partial Q}{\partial x} = \int_c^d \int_{k(y)}^{h(y)} \frac{\partial Q}{\partial x}(x,y) \, dx \, dy = \int_a^b \Bigl( Q\bigl(y,h(y)\bigr) - Q\bigl(y,k(y)\bigr) \Bigr) \, dx = \int_{\partial U} Q \, dy . Putting the two together we obtain \int_{\partial U} P\, dx + Q \, dy = \int_{\partial U} P\, dx + \int_{\partial U} Q \, dy = \int_U \Bigl(-\frac{\partial P}{\partial y}\Bigr) + \int_U \frac{\partial Q}{\partial x} = \int_U \Bigl( \frac{\partial Q}{\partial x} -\frac{\partial P}{\partial y} \Bigr) . \qedhere
Let us illustrate the usefulness of Green’s theorem on a fundamental result about harmonic functions.
Suppose U \subset {\mathbb{R}}^2 is an open set and f \colon U \to {\mathbb{R}} is harmonic, that is, f is twice continuously differentiable and \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} = 0. We will prove one of the most fundamental properties of Harmonic functions.
Let D_r = B(p,r) be closed disc such that its closure C(p,r) \subset U. Write p = (x_0,y_0). We orient \partial D_r positively. See . Then \begin{split} 0 & = \frac{1}{2\pi r} \int_{D_r} \left( \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} \right) \\ & = \frac{1}{2\pi r} \int_{\partial D_r} - \frac{\partial f}{\partial y} \, dx + \frac{\partial f}{\partial x} \, dy \\ & = \frac{1}{2\pi r} \int_0^{2\pi} \biggl( - \frac{\partial f}{\partial y} \bigl(x_0+r\cos(t),y_0+r\sin(t)\bigr) \bigl(-r\sin(t)\bigr) \\ & \hspace{1.2in} + \frac{\partial f}{\partial x} \bigl(x_0+r\cos(t),y_0+r\sin(t)\bigr) r\cos(t) \biggr) \, dt \\ & = \frac{d}{dr} \left[ \frac{1}{2\pi} \int_0^{2\pi} f\bigl(x_0+r\cos(t),y_0+r\sin(t)\bigr) \, dt \right] . \end{split} Let g(r) := \frac{1}{2\pi} \int_0^{2\pi} f\bigl(x_0+r\cos(t),y_0+r\sin(t)\bigr) \, dt. Then g'(r) = 0 for all r > 0. The function is constant for r >0 and continuous at r=0 (exercise). Therefore g(0) = g(r) for all r > 0. Therefore, g(r) = g(0) = \frac{1}{2\pi} \int_0^{2\pi} f\bigl(x_0+0\cos(t),y_0+0\sin(t)\bigr) \, dt = f(x_0,y_0). We proved the mean value property of harmonic functions: f(x_0,y_0) = \frac{1}{2\pi} \int_0^{2\pi} f\bigl(x_0+r\cos(t),y_0+r\sin(t)\bigr) \, dt = \frac{1}{2\pi r} \int_{\partial D_r} f \, ds . That is, the value at p = (x_0,y_0) is the average over a circle of any radius r centered at (x_0,y_0).
Exercises
[green:balltype3orient] Prove that a disc B(p,r) \subset {\mathbb{R}}^2 is a type III domain, and prove that the orientation given by the parametrization \gamma(t) = \bigl(x_0+r\cos(t),y_0+r\sin(t)\bigr) where p = (x_0,y_0) is the positive orientation of the boundary \partial B(p,r).
Prove that any bounded domain with piecewise smooth boundary that is convex is a type III domain.
Suppose V \subset {\mathbb{R}}^2 is a domain with piecewise smooth boundary that is a type III domain and suppose that U \subset {\mathbb{R}}^2 is a domain such that \overline{V} \subset U. Suppose f \colon U \to {\mathbb{R}} is a twice continuously differentiable function. Prove that \int_{\partial V} \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy = 0.
For a disc B(p,r) \subset {\mathbb{R}}^2, orient the boundary \partial B(p,r) positively:
a) Compute \displaystyle \int_{\partial B(p,r)} -y \, dx.
b) Compute \displaystyle \int_{\partial B(p,r)} x \, dy.
c) Compute \displaystyle \int_{\partial B(p,r)} \frac{-y}{2} \, dy + \frac{x}{2} \, dy.
Using Green’s theorem show that the area of a triangle with vertices (x_1,y_1), (x_2,y_2), (x_3,y_3) is \frac{1}{2}\left\lvert {x_1y_2 + x_2 y_3 + x_3 y_1 - y_1x_2 - y_2x_3 - y_3x_1} \right\rvert. Hint: see previous exercise.
Using the mean value property prove the maximum principle for harmonic functions: Suppose U \subset {\mathbb{R}}^2 is an connected open set and f \colon U \to {\mathbb{R}} is harmonic. Prove that if f attains a maximum at p \in U, then f is constant.
Let f(x,y) := \ln \sqrt{x^2+y^2}.
a) Show f is harmonic where defined.
b) Show \lim_{(x,y) \to 0} f(x,y) = -\infty.
c) Using a circle C_r of radius r around the origin, compute \frac{1}{2\pi r} \int_{\partial C_r} f ds. What happens as r \to 0?
d) Why can’t you use Green’s theorem?
- Subscripts are used for many purposes, so sometimes we may have several vectors that may also be identified by subscript, such as a finite or infinite sequence of vectors y_1,y_2,\ldots.↩
- If you want a very funky vector space over a different field, {\mathbb{R}} itself is a vector space over the rational numbers.↩
- The matrix from representing f'(x) is sometimes called the Jacobian matrix.↩
- The word “smooth” is used sometimes for continuously differentiable and sometimes for infinitely differentiable functions in the literature.↩
- Normally only a continuous path is used in this definition, but for open sets the two definitions are equivalent. See the exercises.↩