6.4: Finding Orthogonal Bases
\(\newcommand{\threevec}[3]{\begin{pmatrix} #1 \\ #2 \\ #3 \end{pmatrix} } \)
\(\newcommand{\twovec}[2]{\begin{pmatrix} #1 \\ #2 \end{pmatrix} } \)
The last section demonstrated the value of working with orthogonal, and especially orthonormal, sets. If we have an orthogonal basis \(\mathbf w_1,\mathbf w_2,\ldots,\mathbf w_n\) for a subspace \(W\text{,}\) the Projection Formula 6.3.15 tells us that the orthogonal projection of a vector \(\mathbf b\) onto \(W\) is
An orthonormal basis \(\mathbf u_1,\mathbf u_2,\ldots,\mathbf u_n\) is even more convenient: after forming the matrix \(Q=\begin{bmatrix} \mathbf u_1 & \mathbf u_2 & \ldots & \mathbf u_n \end{bmatrix}\text{,}\) we have \(\widehat{\mathbf{b}} = QQ^T\mathbf b\text{.}\)
In the examples we've seen so far, however, orthogonal bases were given to us. What we need now is a way to form orthogonal bases. In this section, we'll explore an algorithm that begins with a basis for a subspace and creates an orthogonal basis. Once we have an orthogonal basis, we can scale each of the vectors appropriately to produce an orthonormal basis.
Suppose we have a basis for \(\mathbb R^2\) consisting of the vectors
as shown in Figure 6.4.1. Notice that this basis is not orthogonal.
- Find the vector \(\widehat{\mathbf{v}}_2\) that is the orthogonal projection of \(\mathbf v_2\) onto the line defined by \(\mathbf v_1\text{.}\)
- Explain why \(\mathbf v_2 - \widehat{\mathbf{v}}_2\) is orthogonal to \(\mathbf v_1\text{.}\)
- Define the new vectors \(\mathbf w_1=\mathbf v_1\) and \(\mathbf w_2=\mathbf v_2-\widehat{\mathbf{v}}_2\) and sketch them in Figure 6.4.2. Explain why \(\mathbf w_1\) and \(\mathbf w_2\) define an orthogonal basis for \(\mathbb R^2\text{.}\)
- Write the vector \(\mathbf b=\twovec8{-10}\) as a linear combination of \(\mathbf w_1\) and \(\mathbf w_2\text{.}\)
- Scale the vectors \(\mathbf w_1\) and \(\mathbf w_2\) to produce an orthonormal basis \(\mathbf u_1\) and \(\mathbf u_2\) for \(\mathbb R^2\text{.}\)
Subsection 6.4.1 Gram-Schmidt orthogonalization
The preview activity illustrates the main idea behind an algorithm, known as Gram-Schmidt orthogonalization , that begins with a basis for some subspace of \(\mathbb R^m\) and produces an orthogonal or orthonormal basis. The algorithm relies on our construction of the orthogonal projection. Remember that we formed the orthogonal projection \(\widehat{\mathbf{b}}\) of \(\mathbf b\) onto a subspace \(W\) by requiring that \(\mathbf b-\widehat{\mathbf{b}}\) is orthogonal to \(W\) as shown in Figure 6.4.3.
This observation guides our construction of an orthogonal basis for it allows us to create a vector that is orthogonal to a given subspace. Let's see how the Gram-Schmidt algorithm works.
Suppose that \(W\) is a three-dimensional subspace of \(\mathbb R^4\) with basis:
We can see that this basis is not orthogonal by noting that \(\mathbf v_1\cdot\mathbf v_2 = 8\text{.}\) Our goal is to create an orthogonal basis \(\mathbf w_1\text{,}\) \(\mathbf w_2\text{,}\) and \(\mathbf w_3\) for \(W\text{.}\)
To begin, we declare that \(\mathbf w_1=\mathbf v_1\text{,}\) and we call \(W_1\) the line defined by \(\mathbf w_1\text{.}\)
- Find the vector \(\widehat{\mathbf{v}}_2\) that is the orthogonal projection of \(\mathbf v_2\) onto \(W_1\text{,}\) the line defined by \(\mathbf w_1\text{.}\)
- Form the vector \(\mathbf w_2 = \mathbf v_2-\widehat{\mathbf{v}}_2\) and verify that it is orthogonal to \(\mathbf w_1\text{.}\)
-
Explain why \(Span \{\mathbf w_1,\mathbf w_2 \} = Span \{\mathbf v_1,\mathbf v_2 \}\) by showing that any linear combination of \(\mathbf v_1\) and \(\mathbf v_2\) can be written as a linear combination of \(\mathbf w_1\) and \(\mathbf w_2\) and vice versa.
- The vectors \(\mathbf w_1\) and \(\mathbf w_2\) are an orthogonal basis for a two-dimensional subspace \(W_2\) of \(\mathbb R^4\text{.}\) Find the vector \(\widehat{\mathbf{v}}_3\) that is the orthogonal projection of \(\mathbf v_3\) onto \(W_2\text{.}\)
- Verify that \(\mathbf w_3 = \mathbf v_3-\widehat{\mathbf{v}}_3\) is orthogonal to both \(\mathbf w_1\) and \(\mathbf w_2\text{.}\)
- Explain why \(\mathbf w_1\text{,}\) \(\mathbf w_2\text{,}\) and \(\mathbf w_3\) form an orthogonal basis for \(W\text{.}\)
- Now find an orthonormal basis for \(W\text{.}\)
As this activity illustrates, Gram-Schmidt orthogonalization begins with a basis \(\mathbf v_1\mathbf v_2,\ldots,\mathbf v_n\) for a subspace \(W\) of \(\mathbb R^m\) and creates an orthogonal basis for \(W\text{.}\) Let's work through a second example.
Let's start with the basis
which is a basis for \(\mathbb R^3\text{.}\)
To get started, we'll simply set \(\mathbf w_1=\mathbf v_1=\threevec{2}{-1}2\text{.}\) We construct \(\mathbf w_2\) from \(\mathbf v_2\) by subtracting its orthogonal projection onto \(W_1\text{,}\) the line defined by \(\mathbf w_1\text{.}\) This gives
Notice that we found \(\mathbf v_2 = -\mathbf w_1 + \mathbf w_2\text{.}\) Therefore, we can rewrite any linear combination of \(\mathbf v_1\) and \(\mathbf v_2\) as
a linear combination of \(\mathbf w_1\) and \(\mathbf w_2\text{.}\) This tells us that
In other words, \(\mathbf w_1\) and \(\mathbf w_2\) is a basis for the same 2-dimensional subsapce as \(\mathbf v_1\) and \(\mathbf v_2\text{.}\)
Finally, we form \(\mathbf w_3\) from \(\mathbf v_3\) by subtracting its orthogonal projection onto \(W_2\text{:}\)
We can now check that
is an orthogonal set. Furthermore, we find that, as before, \(Span \{\mathbf w_1,\mathbf w_2,\mathbf w_3 \} = Span \{\mathbf v_1,\mathbf v_2,\mathbf v_3 \}\) so that we have found a new orthogonal basis for \(\mathbb R^3\text{.}\)
To create an orthonormal basis, we form unit vectors parallel to each of the vectors in the orthogonal basis:
More generally, if we have a basis \(\mathbf v_1,\mathbf v_2,\ldots,\mathbf v_n\) for a subspace \(W\) of \(\mathbb R^m\text{,}\) the Gram-Schmidt algorithm creates an orthogonal basis for \(W\) in the following way:
From here, we may form an orthonormal basis by constructing a unit vector parallel to each vector in the orthogonal basis: \(\mathbf u_j = 1/\len{\mathbf w_j}~\mathbf w_j\text{.}\)
Sage can automate these computations for us. Before we begin, however, it will be helpful to understand how we can combine things using a
list
in Python. For instance, if the vectors
v1
,
v2
, and
v3
form a basis for a subspace, we can bundle them together using square brackets:
[v1, v2, v3]
. Furthermore, we could assign this to a variable, such as
basis = [v1, v2, v3]
.
Evaluating the following cell will load in some special commands.
-
There is a command to apply the projection formula:
projection(b, basis)
returns the orthogonal projection ofb
onto the subspace spanned bybasis
, which is a list of vectors. -
The command
unit(w)
returns a unit vector parallel tow
. -
Given a collection of vectors, say,
v1
andv2
, we can form the matrix whose columns arev1
andv2
usingmatrix([v1, v2]).T
. When given alist
of vectors, Sage constructs a matrix whose rows are the given vectors. For this reason, we need to apply the tranpose.
Let's now consider \(W\text{,}\) the subspace of \(\mathbb R^5\) having basis
-
Apply the Gram-Schmidt algorithm to find an orthogonal basis \(\mathbf w_1\text{,}\) \(\mathbf w_2\text{,}\) and \(\mathbf w_3\) for \(W\text{.}\)
-
Find \(\widehat{\mathbf{b}}\text{,}\) the orthogonal projection of \(\mathbf b = \fivevec{-5}{11}0{-1}5\) onto \(W\text{.}\)
-
Explain why we know that \(\widehat{\mathbf{b}}\) is a linear combination of the original vectors \(\mathbf v_1\text{,}\) \(\mathbf v_2\text{,}\) and \(\mathbf v_3\) and then find weights so that
\begin{equation*} \widehat{\mathbf{b}} = c_1\mathbf v_1 + c_2\mathbf v_2 + c_3\mathbf v_3. \end{equation*}
-
Find an orthonormal basis \(\mathbf u_1\text{,}\) \(\mathbf u_2\text{,}\) for \(\mathbf u_3\) for \(W\) and form the matrix \(Q\) whose columns are these vectors.
- Find the product \(Q^TQ\) and explain the result.
- Find the matrix \(P\) that projects vectors orthogonally onto \(W\) and verify that \(P\mathbf b\) gives \(\widehat{\mathbf{b}}\text{,}\) the orthogonal projection that you found earlier.
Subsection 6.4.2 \(QR\) factorizations
Now that we've seen how the Gram-Schmidt algorithm forms an orthonormal basis for a given subspace, we will explore how the algorithm leads to an important matrix factorization known as the \(QR\) factorization.
Suppose that \(A\) is the \(4\times3\) matrix whose columns are
These vectors form a basis for \(W\text{,}\) the subspace of \(\mathbb R^4\) that we encountered in Activity 6.4.2. Since these vectors are the columns of \(A\text{,}\) we have \(Col(A) = W\text{.}\)
-
When we implemented Gram-Schmidt, we first found an orthogonal basis \(\mathbf w_1\text{,}\) \(\mathbf w_2\text{,}\) and \(\mathbf w_3\) using
\begin{equation*} \begin{aligned} \mathbf w_1 & = \mathbf v_1 \\ \mathbf w_2 & = \mathbf v_2 - \frac{\mathbf v_2\cdot\mathbf w_1}{\mathbf w_1\cdot\mathbf w_1}\mathbf w_1 \\ \mathbf w_3 & = \mathbf v_3 - \frac{\mathbf v_3\cdot\mathbf w_1}{\mathbf w_1\cdot\mathbf w_2}\mathbf w_1 - \frac{\mathbf v_3\cdot\mathbf w_2}{\mathbf w_2\cdot\mathbf w_2}\mathbf w_2\text{.} \\ \end{aligned} \end{equation*}
Use these expressions to write \(\mathbf v_1\text{,}\) \(\mathbf v_1\text{,}\) and \(\mathbf v_3\) as linear combinations of \(\mathbf w_1\text{,}\) \(\mathbf w_2\text{,}\) and \(\mathbf w_3\text{.}\)
-
We next normalized the orthogonal basis \(\mathbf w_1\text{,}\) \(\mathbf w_2\text{,}\) and \(\mathbf w_3\) to obtain an orthonormal basis \(\mathbf u_1\text{,}\) \(\mathbf u_2\text{,}\) and \(\mathbf u_3\text{.}\)
Write the vectors \(\mathbf w_i\) as scalar multiples of \(\mathbf u_i\text{.}\) Then use these expressions to write \(\mathbf v_1\text{,}\) \(\mathbf v_1\text{,}\) and \(\mathbf v_3\) as linear combinations of \(\mathbf u_1\text{,}\) \(\mathbf u_2\text{,}\) and \(\mathbf u_3\text{.}\)
-
Suppose that \(Q = \left[ \begin{array}{ccc} \mathbf u_1 & \mathbf u_2 & \mathbf u_3 \end{array} \right]\text{.}\) Use the result of the previous part to find a vector \(\rvec_1\) so that \(Q\rvec_1 = \mathbf v_1\text{.}\)
-
Then find vectors \(\rvec_2\) and \(\rvec_3\) such that \(Q\rvec_2 = \mathbf v_2\) and \(Q\rvec_3 = \mathbf v_3\text{.}\)
-
Construct the matrix \(R = \left[ \begin{array}{ccc} \rvec_1 & \rvec_2 & \rvec_3 \end{array} \right]\text{.}\) Remembering that \(A = \left[ \begin{array}{ccc} \mathbf v_1 & \mathbf v_2 & \mathbf v_3 \end{array} \right]\text{,}\) explain why \(A = QR\text{.}\)
- What is special about the shape of \(R\text{?}\)
- Suppose that \(A\) is a \(10\times 6\) matrix whose columns are linearly independent. This means that the columns of \(A\) form a basis for \(W=Col(A)\text{,}\) a 6-dimensional subspace of \(\mathbb R^{10}\text{.}\) Suppose that we apply Gram-Schmidt orthogonalization to create an orthonormal basis whose vectors form the columns of \(Q\) and that we write \(A=QR\text{.}\) What are the dimensions of \(Q\) and what are the dimensions of \(R\text{?}\)
When the columns of a matrix \(A\) are linearly independent, they form a basis for \(Col(A)\) so that we can perform the Gram-Schmidt algorithm. The previous activity shows how this leads to a factorization of \(A\) as the product of a matrix \(Q\) whose columns are an orthonormal basis for \(Col(A)\) and an upper triangular matrix \(R\text{.}\)
We'll consider the matrix \(A=\begin{bmatrix} 2 & -3 & -2 \\ -1 & 3 & 7 \\ 2 & 0 & 1 \\ \end{bmatrix}\) whose columns, which we'll denote \(\mathbf v_1\text{,}\) \(\mathbf v_2\text{,}\) and \(\mathbf v_3\text{,}\) are the basis of \(\mathbb R^3\) that we considered in Example 6.4.4. There we found an orthogonal basis \(\mathbf w_1\text{,}\) \(\mathbf w_2\text{,}\) and \(\mathbf w_3\) that satisfied
In terms of the resulting orthonormal basis \(\mathbf u_1\text{,}\) \(\mathbf u_2\text{,}\) and \(\mathbf u_3\text{,}\) we had
so that
Therefore, if \(Q=\begin{bmatrix} \mathbf u_1 & \mathbf u_2 & \mathbf u_3 \end{bmatrix}\text{,}\) we have the \(QR\) factorization
As before, we would like to use Sage to automate the process of finding and using the \(QR\) factorization of a matrix \(A\text{.}\) Evaluating the following cell provides a command
QR(A)
that returns the factorization, which may be stored using, for example,
Q, R = QR(A)
.
Suppose that \(A\) is the following matrix whose columns are linearly independent.
- If \(A=QR\text{,}\) what are the dimensions of \(Q\) and \(R\text{?}\) What is special about the form of \(R\text{?}\)
-
Find the \(QR\) factorization using
Q, R = QR(A)
and verify that \(R\) has the predicted shape and that \(A=QR\text{.}\) - Find the matrix \(P\) that orthogonally projects vectors onto \(Col(A)\text{.}\)
- Find \(\widehat{\mathbf{b}}\text{,}\) the orthogonal projection of \(\mathbf b=\fourvec4{-17}{-14}{22}\) onto \(Col(A)\text{.}\)
- Explain why the equation \(A\mathbf x=\widehat{\mathbf{b}}\) must be consistent and then find \(\mathbf x\text{.}\)
In fact, Sage provides its own version of the \(QR\) factorization that is a bit different than the way we've developed the factorization here. For this reason, we have provided our own version of the factorization.
Subsection 6.4.3 Summary
This section explored the Gram-Schmidt orthogonalization algorithm and how it leads to the matrix factorization \(A=QR\) when the columns of \(A\) are linearly independent.
-
Beginning with a basis \(\mathbf v_1, \mathbf v_2,\ldots,\mathbf v_n\) for a subspace \(W\) of \(\mathbb R^m\text{,}\) the vectors
\begin{align*} \mathbf w_1 & = \mathbf v_1\\ \mathbf w_2 & = \mathbf v_2 - \frac{\mathbf v_2\cdot\mathbf w_1}{\mathbf w_1\cdot\mathbf w_1}\mathbf w_1\\ \mathbf w_3 & = \mathbf v_3 - \frac{\mathbf v_3\cdot\mathbf w_1}{\mathbf w_1\cdot\mathbf w_1}\mathbf w_1 - \frac{\mathbf v_3\cdot\mathbf w_2}{\mathbf w_2\cdot\mathbf w_2}\mathbf w_2\\ & \vdots\\ \mathbf w_n & = \mathbf v_n - \frac{\mathbf v_n\cdot\mathbf w_1}{\mathbf w_1\cdot\mathbf w_1}\mathbf w_1 - \frac{\mathbf v_n\cdot\mathbf w_2}{\mathbf w_2\cdot\mathbf w_2}\mathbf w_2 - \ldots - \frac{\mathbf v_n\cdot\mathbf w_{n-1}} {\mathbf w_{n-1}\cdot\mathbf w_{n-1}}\mathbf w_{n-1} \end{align*}form an orthogonal basis for \(W\text{.}\)
- We may scale each vector \(\mathbf w_i\) appropriately to obtain an orthonormal basis \(\mathbf u_1,\mathbf u_2,\ldots,\mathbf u_n\text{.}\)
- Expressing the Gram-Schmidt algorithm in matrix form shows that, if the columns of \(A\) are linearly independent, then we can write \(A=QR\text{,}\) where the columns of \(Q\) form an orthonormal basis for \(Col(A)\) and \(R\) is upper triangular.
Exercises 6.4.4 Exercises
Suppose that a subspace \(W\) of \(\mathbb R^3\) has a basis formed by
- Find an orthogonal basis for \(W\text{.}\)
- Find an orthonormal basis for \(W\text{.}\)
- Find the matrix \(P\) that projects vectors orthogonally onto \(W\text{.}\)
- Find the orthogonal projection of \(\threevec34{-2}\) onto \(W\text{.}\)
Find the \(QR\) factorization of \(A=\begin{bmatrix} 4 & 7 \\ -2 & 4 \\ 4 & 4 \end{bmatrix} \text{.}\)
Consider the basis of \(\mathbb R^3\) given by the vectors
- Apply the Gram-Schmit orthogonalization algorithm to find an orthonormal basis \(\mathbf u_1\text{,}\) \(\mathbf u_2\text{,}\) \(\mathbf u_3\) for \(\mathbb R^3\text{.}\)
- If \(A\) is the \(3\times3\) whose columns are \(\mathbf v_1\text{,}\) \(\mathbf v_2\text{,}\) and \(\mathbf v_3\text{,}\) find the \(QR\) factorization of \(A\text{.}\)
-
Suppose that we want to solve the equation \(A\mathbf x=\mathbf b = \threevec{-9}17\text{,}\) which we can rewrite as \(QR\mathbf x = \mathbf b\text{.}\)
- If we set \(\yvec=R\mathbf x\text{,}\) explain why the equation \(Q\yvec=\mathbf b\) is computationally easy to solve.
- Explain why the equation \(R\mathbf x=\yvec\) is computationally easy to solve.
- Find the solution \(\mathbf x\text{.}\)
Consider the vectors
and the subspace \(W\) of \(\mathbb R^5\) that they span.
- Find an orthonormal basis for \(W\text{.}\)
- Find the \(5\times5\) matrix that projects vectors orthogonally onto \(W\text{.}\)
- Find \(\widehat{\mathbf{b}}\text{,}\) the orthogonal projection of \(\mathbf b=\fivevec{-8}3{-12}8{-4}\) onto \(W\text{.}\)
- Express \(\widehat{\mathbf{b}}\) as a linear combination of \(\mathbf v_1\text{,}\) \(\mathbf v_2\text{,}\) and \(\mathbf v_3\text{.}\)
Consider the set of vectors
- What happens when we apply the Gram-Schmit orthogonalization algorithm?
- Why does the algorithm fail to produce an orthogonal basis for \(\mathbb R^3\text{?}\)
Suppose that \(A\) is a matrix with linearly independent columns and having the factorization \(A=QR\text{.}\) Determine whether the following statements are true or false and explain your thinking.
- It follows that \(R=Q^TA\text{.}\)
- The matrix \(R\) is invertible.
- The product \(Q^TQ\) projects vectors orthogonally onto \(Col(A)\text{.}\)
- The columns of \(Q\) are an orthogonal basis for \(Col(A)\text{.}\)
-
The orthogonal complement \(Col(A)^\perp = Nul(Q^T)\text{.}\)
Suppose we have the \(QR\) factorization \(A=QR\text{,}\) where \(A\) is a \(7\times 4\) matrix.
- What are the dimensions of the product \(QQ^T\text{?}\) Explain the significance of this product.
- What are the dimensions of the product \(Q^TQ\text{?}\) Explain the significance of this product.
- What are the dimensions of the matrix \(R\text{?}\)
- If \(R\) is a diagonal matrix, what can you say about the columns of \(A\text{?}\)
Suppose we have the \(QR\) factorization \(A=QR\) where the columns of \(A\) are \(\avec_1,\avec_2,\ldots,\avec_n\) and the columns of \(R\) are \(\rvec_1,\rvec_2,\ldots,\rvec_n\text{.}\)
- How can the matrix product \(A^TA\) be expressed in terms of dot products?
- How can the matrix product \(R^TR\) be expressed in terms of dot products?
- Explain why \(A^TA=R^TR\text{.}\)
-
Explain why the dot product \(\avec_i\cdot\avec_j = \rvec_i\cdot\rvec_j\text{.}\)