Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Mathematics LibreTexts

4.11: Orthogonality

( \newcommand{\kernel}{\mathrm{null}\,}\)

Outcomes
  1. Determine if a given set is orthogonal or orthonormal.
  2. Determine if a given matrix is orthogonal.
  3. Given a linearly independent set, use the Gram-Schmidt Process to find corresponding orthogonal and orthonormal sets.
  4. Find the orthogonal projection of a vector onto a subspace.
  5. Find the least squares approximation for a collection of points.

In this section, we examine what it means for vectors (and sets of vectors) to be orthogonal and orthonormal. First, it is necessary to review some important concepts. You may recall the definitions for the span of a set of vectors and a linear independent set of vectors. We include the definitions and examples here for convenience.

Definition : Span of a Set of Vectors and Subspace

The collection of all linear combinations of a set of vectors in is known as the span of these vectors and is written as .
We call a collection of the form a subspace of .

Consider the following example.

Example : Spanning Vectors

Describe the span of the vectors and .

Solution

You can see that any linear combination of the vectors and yields a vector in the -plane.

Moreover every vector in the -plane is in fact such a linear combination of the vectors and . That’s because

Thus span is precisely the -plane.

The span of a set of a vectors in is what we call a subspace of . A subspace is characterized by the feature that any linear combination of vectors of is again a vector contained in .

Another important property of sets of vectors is called linear independence.

Definition : Linear Independence

A set of non-zero vectors in is said to be linearly independent if no vector in that set is in the span of the other vectors of that set.

Here is an example.

Example : Linearly Independent Vectors

Consider vectors , , and . Verify whether the set is linearly independent.

Solution

We already verified in Example that is the -plane. Since is clearly also in the -plane, then the set is not linearly independent.

In terms of spanning, a set of vectors is linearly independent if it does not contain unnecessary vectors. In the previous example you can see that the vector does not help to span any new vector not already in the span of the other two vectors. However you can verify that the set is linearly independent, since you will not get the -plane as the span of a single vector.

We can also determine if a set of vectors is linearly independent by examining linear combinations. A set of vectors is linearly independent if and only if whenever a linear combination of these vectors equals zero, it follows that all the coefficients equal zero. It is a good exercise to verify this equivalence, and this latter condition is often used as the (equivalent) definition of linear independence.

If a subspace is spanned by a linearly independent set of vectors, then we say that it is a basis for the subspace.

Definition : Basis

Let be a subspace of . Then is a basis for if the following two conditions hold.

  1. is linearly independent

Thus the set of vectors from Example is a basis for -plane in since it is both linearly independent and spans the -plane.

Recall from the properties of the dot product of vectors that two vectors and are orthogonal if . Suppose a vector is orthogonal to a spanning set of . What can be said about such a vector? This is the discussion in the following example.

Example : Orthogonal Vector to a Spanning Set

Let and suppose . Furthermore, suppose that there exists a vector for which for all , . What type of vector is ?

Solution

Write for some (this is possible because span ).

Then

Since , . We know that if and only if . Therefore, . In conclusion, the only vector orthogonal to every vector of a spanning set of is the zero vector.

We can now discuss what is meant by an orthogonal set of vectors.

Definition : Orthogonal Set of Vectors

Let be a set of vectors in . Then this set is called an orthogonal set if the following conditions hold:

  1. for all
  2. for all

If we have an orthogonal set of vectors and normalize each vector so they have length 1, the resulting set is called an orthonormal set of vectors. They can be described as follows.

Definition Orthonormal Set of Vectors

A set of vectors, is said to be an orthonormal set if

Note that all orthonormal sets are orthogonal, but the reverse is not necessarily true since the vectors may not be normalized. In order to normalize the vectors, we simply need divide each one by its length.

Below is a video on orthogonal and orthonormal sets of vectors.

Definition : Normalizing an Orthogonal Set

Normalizing an orthogonal set is the process of turning an orthogonal (but not orthonormal) set into an orthonormal set. If is an orthogonal subset of , then is an orthonormal set.

We illustrate this concept in the following example.

Example : Orthonormal Set

Consider the set of vectors given by Show that it is an orthogonal set of vectors but not an orthonormal one. Find the corresponding orthonormal set.

Solution

One easily verifies that and is an orthogonal set of vectors. On the other hand one can compute that and thus it is not an orthonormal set.

Thus to find a corresponding orthonormal set, we simply need to normalize each vector. We will write for the corresponding orthonormal set. Then,

Similarly,

Therefore the corresponding orthonormal set is

You can verify that this set is orthogonal.

Consider an orthogonal set of vectors in , written with . The span of these vectors is a subspace of . If we could show that this orthogonal set is also linearly independent, we would have a basis of . We will show this in the next theorem.

Theorem : Orthogonal Basis of a Subspace

Let be an orthonormal set of vectors in . Then this set is linearly independent and forms a basis for the subspace .

Proof

To show it is a linearly independent set, suppose a linear combination of these vectors equals , such as: We need to show that all . To do so, take the dot product of each side of the above equation with the vector and obtain the following.

Now since the set is orthogonal, for all , so we have:

Since the set is orthogonal, we know that . It follows that . Since the was chosen arbitrarily, the set is linearly independent.

Finally since , the set of vectors also spans and therefore forms a basis of .

If an orthogonal set is a basis for a subspace, we call this an orthogonal basis. Similarly, if an orthonormal set is a basis, we call this an orthonormal basis.

We conclude this section with a discussion of Fourier expansions. Given any orthogonal basis of and an arbitrary vector , how do we express as a linear combination of vectors in ? The solution is Fourier expansion.

Theorem : Fourier Expansion

Let be a subspace of and suppose is an orthogonal basis of . Then for any ,

This expression is called the Fourier expansion of , and are the Fourier coefficients.

Consider the following example.

Example : Fourier Expansion

Let , and , and let .

Then is an orthogonal basis of .

Compute the Fourier expansion of , thus writing as a linear combination of the vectors of .

Solution

Since is a basis (verify!) there is a unique way to express as a linear combination of the vectors of . Moreover since is an orthogonal basis (verify!), then this can be done by computing the Fourier expansion of .

That is:

We readily compute:

Therefore,

Orthogonal Matrices

Recall that the process to find the inverse of a matrix was often cumbersome. In contrast, it was very easy to take the transpose of a matrix. Luckily for some special matrices, the transpose equals the inverse. When an matrix has all real entries and its transpose equals its inverse, the matrix is called an orthogonal matrix.

The precise definition is as follows.

Definition : Orthogonal Matrices

A real matrix is called an orthogonal matrix if

Note since is assumed to be a square matrix, it suffices to verify only one of these equalities or holds to guarantee that is the inverse of .

Consider the following example.

Example : Orthogonal Matrix

Orthogonal Matrix Show the matrix is orthogonal.

Solution

All we need to do is verify (one of the equations from) the requirements of Definition .

Since , this matrix is orthogonal.

Here is another example.

Example : Orthogonal Matrix

Orthogonal Matrix Let Is orthogonal?

Solution

Again the answer is yes and this can be verified simply by showing that :

When we say that is orthogonal, we are saying that , meaning that where is the Kronecker symbol defined by

In words, the product of the row of with the row gives if and if The same is true of the columns because also. Therefore, which says that the product of one column with another column gives if the two columns are the same and if the two columns are different.

More succinctly, this states that if are the columns of an orthogonal matrix, then

We will say that the columns form an orthonormal set of vectors, and similarly for the rows. Thus a matrix is orthogonal if its rows (or columns) form an orthonormal set of vectors. Notice that the convention is to call such a matrix orthogonal rather than orthonormal (although this may make more sense!).

Proposition : Orthonormal Basis

The rows of an orthogonal matrix form an orthonormal basis of . Further, any orthonormal basis of can be used to construct an orthogonal matrix.

Proof

Recall from Theorem that an orthonormal set is linearly independent and forms a basis for its span. Since the rows of an orthogonal matrix form an orthonormal set, they must be linearly independent. Now we have linearly independent vectors, and it follows that their span equals . Therefore these vectors form an orthonormal basis for .

Suppose now that we have an orthonormal basis for . Since the basis will contain vectors, these can be used to construct an matrix, with each vector becoming a row. Therefore the matrix is composed of orthonormal rows, which by our above discussion, means that the matrix is orthogonal. Note we could also have construct a matrix with each vector becoming a column instead, and this would again be an orthogonal matrix. In fact this is simply the transpose of the previous matrix.

Consider the following proposition.

Proposition : Determinant of Orthogonal Matrices

Det Suppose is an orthogonal matrix. Then

Proof

This result follows from the properties of determinants. Recall that for any matrix , . Now if is orthogonal, then:

Therefore and it follows that .

Orthogonal matrices are divided into two classes, proper and improper. The proper orthogonal matrices are those whose determinant equals 1 and the improper ones are those whose determinant equals . The reason for the distinction is that the improper orthogonal matrices are sometimes considered to have no physical significance. These matrices cause a change in orientation which would correspond to material passing through itself in a non physical manner. Thus in considering which coordinate systems must be considered in certain applications, you only need to consider those which are related by a proper orthogonal transformation. Geometrically, the linear transformations determined by the proper orthogonal matrices correspond to the composition of rotations.

We conclude this section with two useful properties of orthogonal matrices.

Example : Product and Inverse of Orthogonal Matrices

Suppose and are orthogonal matrices. Then and both exist and are orthogonal.

Solution

First we examine the product . Since is square, is the inverse of , so is invertible, and Therefore, is orthogonal.

Next we show that is also orthogonal. Therefore is also orthogonal.

Gram-Schmidt Process

The Gram-Schmidt process is an algorithm to transform a set of vectors into an orthonormal set spanning the same subspace, that is generating the same collection of linear combinations (see Definition 9.2.2).

The goal of the Gram-Schmidt process is to take a linearly independent set of vectors and transform it into an orthonormal set with the same span. The first objective is to construct an orthogonal set of vectors with the same span, since from there an orthonormal set can be obtained by simply dividing each vector by its length.

Algorithm : Gram-Schmidt Process

Let be a set of linearly independent vectors in .

I: Construct a new set of vectors as follows:

II: Now let for .

Then

  1. is an orthogonal set.
  2. is an orthonormal set.
  3. .
Solution

The full proof of this algorithm is beyond this material, however here is an indication of the arguments.

To show that is an orthogonal set, let then: Now that you have shown that is orthogonal, use the same method as above to show that is also orthogonal, and so on.

Then in a similar fashion you show that .

Finally defining for does not affect orthogonality and yields vectors of length 1, hence an orthonormal set. You can also observe that it does not affect the span either and the proof would be complete.

Below is a video on the Gram Schmidt process.

Consider the following example.

Example : Find Orthonormal Set with Same Span

Consider the set of vectors given as in Example . That is

Use the Gram-Schmidt algorithm to find an orthonormal set of vectors having the same span.

Solution

We already remarked that the set of vectors in is linearly independent, so we can proceed with the Gram-Schmidt algorithm:

Now to normalize simply let

You can verify that is an orthonormal set of vectors having the same span as , namely the -plane.

In this example, we began with a linearly independent set and found an orthonormal set of vectors which had the same span. It turns out that if we start with a basis of a subspace and apply the Gram-Schmidt algorithm, the result will be an orthogonal basis of the same subspace. We examine this in the following example.

Example : Find a Corresponding Orthogonal Basis

Let and let . Use the Gram-Schmidt Process to construct an orthogonal basis of .

Solution

First .

Next,

Finally,

Therefore, is an orthogonal basis of . However, it is sometimes more convenient to deal with vectors having integer entries, in which case we take

Below is another video on the Gram Schmidt process.

Orthogonal Projections

An important use of the Gram-Schmidt Process is in orthogonal projections, the focus of this section.

You may recall that a subspace of is a set of vectors which contains the zero vector, and is closed under addition and scalar multiplication. Let’s call such a subspace . In particular, a plane in which contains the origin, , is a subspace of .

Suppose a point in is not contained in , then what point in is closest to ? Using the Gram-Schmidt Process, we can find such a point. Let represent the position vectors of the points and respectively, with representing the vector connecting the two points and . It will follow that if is the point on closest to , then will be perpendicular to (can you see why?); in other words, is orthogonal to (and to every vector contained in ) as in the following diagram.

plane W containing points 0 and Z.  Vector from 0 to Z. Point Y not on W.  Vector y from 0 to Y shown.  Vector y-z perpendicular to W.
Figure

The vector is called the orthogonal projection of on . The definition is given as follows.

Definition : Orthogonal Projection

Let be a subspace of , and be any point in . Then the orthogonal projection of onto is given by where is any orthogonal basis of .

Therefore, in order to find the orthogonal projection, we must first find an orthogonal basis for the subspace. Note that one could use an orthonormal basis, but it is not necessary in this case since as you can see above the normalization of each vector is included in the formula for the projection.

Below is a video on finding an orthogonal projection of a vector onto a line.

Below is a video on finding an orthogonal projection of a vector onto a plane.

Below is a video on finding an orthogonal projection of a vector onto a subspace of .

Before we explore this further through an example, we show that the orthogonal projection does indeed yield a point (the point whose position vector is the vector above) which is the point of closest to .

Theorem : Approximation Theorem

Let be a subspace of and any point in . Let be the point whose position vector is the orthogonal projection of onto .

Then, is the point in closest to .

Proof

First is certainly a point in since it is in the span of a basis of .

To show that is the point in closest to , we wish to show that for all . We begin by writing . Now, the vector is orthogonal to , and is contained in . Therefore these vectors are orthogonal to each other. By the Pythagorean Theorem, we have that This follows because so

Hence, . Taking the square root of each side, we obtain the desired result.

Consider the following example.

Example : Orthogonal Projection

Let be the plane through the origin given by the equation .
Find the point in closest to the point .

Solution

We must first find an orthogonal basis for . Notice that is characterized by all points where . In other words,

We can thus write as

Notice that this span is a basis of as it is linearly independent. We will use the Gram-Schmidt Process to convert this to an orthogonal basis, . In this case, as we remarked it is only necessary to find an orthogonal basis, and it is not required that it be orthonormal.

Therefore an orthogonal basis of is

We can now use this basis to find the orthogonal projection of the point on the subspace . We will write the position vector of as . Using Definition , we compute the projection as follows:

Therefore the point on closest to the point is .

Recall that the vector is perpendicular (orthogonal) to all the vectors contained in the plane . Using a basis for , we can in fact find all such vectors which are perpendicular to . We call this set of vectors the orthogonal complement of and denote it .

Definition : Orthogonal Complement

Let be a subspace of . Then the orthogonal complement of , written , is the set of all vectors such that for all vectors in .

The orthogonal complement is defined as the set of all vectors which are orthogonal to all vectors in the original subspace. It turns out that it is sufficient that the vectors in the orthogonal complement be orthogonal to a spanning set of the original space.

Proposition : Orthogonal to Spanning Set

Let be a subspace of such that . Then is the set of all vectors which are orthogonal to each in the spanning set.

The following proposition demonstrates that the orthogonal complement of a subspace is itself a subspace.

Proposition : The Orthogonal Complement

Let be a subspace of . Then the orthogonal complement is also a subspace of .

Consider the following proposition.

Proposition : Orthogonal Complement of

The complement of is the set containing the zero vector: Similarly,

Proof

Here, is the zero vector of . Since for all , . Since , the equality follows, i.e., .

Again, since for all , , so . Suppose , . Since and , , so . Therefore , and thus .

In the next example, we will look at how to find .

Example : Orthogonal Complement

Let be the plane through the origin given by the equation . Find a basis for the orthogonal complement of .

Solution

From Example we know that we can write as

In order to find , we need to find all which are orthogonal to every vector in this span.

Let . In order to satisfy , the following equation must hold.

In order to satisfy , the following equation must hold.

Both of these equations must be satisfied, so we have the following system of equations.

To solve, set up the augmented matrix.

Using Gaussian Elimination, we find that , and hence is a basis for .

The following results summarize the important properties of the orthogonal projection.

Theorem : Orthogonal Projection

Let be a subspace of , be any point in , and let be the point in closest to . Then,

  1. The position vector of the point is given by
  2. and
  3. for all

Consider the following example of this concept.

Example : Find a Vector Closest to a Given Vector

Let We want to find the vector in closest to .

Solution

We will first use the Gram-Schmidt Process to construct the orthogonal basis, , of :

By Theorem , is the vector in closest to .

Consider the next example.

Example :Vector Written as a Sum of Two Vectors

Let be a subspace given by , and .
Find the point in closest to , and moreover write as the sum of a vector in and a vector in .

Solution

From Theorem the point in closest to is given by .

Notice that since the above vectors already give an orthogonal basis for , we have:

Therefore the point in closest to is .
Now, we need to write as the sum of a vector in and a vector in . This can easily be done as follows: since is in and as we have seen is in .
The vector is given by Therefore, we can write as

Example :Point in a Plane Closest to a Given Point

Find the point in the plane that is closest to the point .

Solution

The solution will proceed as follows.

  1. Find a basis of the subspace of defined by the equation .
  2. Orthogonalize the basis to get an orthogonal basis of .
  3. Find the projection on of the position vector of the point .

We now begin the solution.

  1. is a system of one equation in three variables. Putting the augmented matrix in reduced row-echelon form: gives general solution , , for any . Then Let . Then is linearly independent and , so is a basis of .
  2. Use the Gram-Schmidt Process to get an orthogonal basis of : Therefore is an orthogonal basis of .
  3. To find the point on closest to , compute Therefore, .

Least Squares Approximation

It should not be surprising to hear that many problems do not have a perfect solution, and in these cases the objective is always to try to do the best possible. For example what does one do if there are no solutions to a system of linear equations ? It turns out that what we do is find such that is as close to as possible. A very important technique that follows from orthogonal projections is that of the least square approximation, and allows us to do exactly that.

We begin with a lemma.

Recall that we can form the image of an matrix by . Rephrasing Theorem using the subspace gives the equivalence of an orthogonality condition with a minimization condition. The following picture illustrates this orthogonality condition and geometric meaning of this theorem.

Plane A containing vectors u and z=Ax.  Three vectors containing a point above A with the vector y-z orthogonal to z.
Figure
Theorem :Existence of Minimizers

Let and let be an matrix.

Choose given by , and let such that .

Then

  1. for all

We note a simple but useful observation.

Lemma :Transpose and Dot Product

Let be an matrix. Then

Proof

This follows from the definitions:

The next corollary gives the technique of least squares.

Corollary :Least Squares and Normal Equation

A specific value of which solves the problem of Theorem is obtained by solving the equation Furthermore, there always exists a solution to this system of equations.

Proof

For the minimizer of Theorem , for all and from Lemma , this is the same as saying for all This implies Therefore, there is a solution to the equation of this corollary, and it solves the minimization problem of Theorem .

Note that might not be unique but , the closest point of to is unique as was shown in the above argument.

Consider the following example.

Example : Least Squares Solution to a System

Find a least squares solution to the system

Solution

First, consider whether there exists a real solution. To do so, set up the augmnented matrix given by The reduced row-echelon form of this augmented matrix is

It follows that there is no real solution to this system. Therefore we wish to find the least squares solution. The normal equations are and so we need to solve the system This is a familiar exercise and the solution is

Consider another example.

Example : Least Squares Solution to a System

Find a least squares solution to the system

Solution

First, consider whether there exists a real solution. To do so, set up the augmnented matrix given by The reduced row-echelon form of this augmented matrix is

It follows that the system has a solution given by . However we can also use the normal equations and find the least squares solution. Then

The least squares solution is which is the same as the solution found above.

An important application of Corollary is the problem of finding the least squares regression line in statistics. Suppose you are given points in the plane and you would like to find constants and such that the line goes through all these points. Of course this will be impossible in general. Therefore, we try to find such that the line will be as close as possible. The desired system is

which is of the form . It is desired to choose and to make

as small as possible. According to Theorem and Corollary , the best values for and occur as the solution to

Thus, computing

Solving this system of equations for and (using Cramer’s rule for example) yields:

and

Consider the following example.

Example :Least Squares Regression

Find the least squares regression line for the following set of data points:

Solution

In this case we have data points and we obtain: and hence

The least squares regression line for the set of data points is:

One could use this line to approximate other values for the data. For example for one could use as an approximate value for the data.

The following diagram shows the data points and the corresponding regression line.

The xy-plne with data points that are connected by line segments.  The regression line is shown close to each of the points.
Figure

One could clearly do a least squares fit for curves of the form in the same way. In this case you want to solve as well as possible for and the system and one would use the same technique as above. Many other similar problems are important, including many in higher dimensions and they are all solved the same way.


This page titled 4.11: Orthogonality is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Ken Kuttler (Lyryx) .

Support Center

How can we help?