Skip to main content

Section 3.3 Arnoldi iteration

The \(QR\) algorithm that we developed in the previous section is a general purpose algorithm for finding all the eigenvalues of a symmetric matrix. We now return to the problem we stated at the beginning of this chapter. We assume that
  • \(A\) is an \(m\times m\) matrix, where \(m\) is possibly huge, and \(A\) is not necessarily symmetric but is possibly sparse.
  • We do not want to store the entire matrix in a computer, but instead we assume that we can efficiently compute \(A\xvec\) for any vector \(\xvec\text{.}\)
We will develop techniques for approximating the largest eigenvalues of \(A\) and for approximating the solution of \(A\xvec = \bvec\text{.}\) Two key ideas will be the use of Krylov subspaces, which were introduced in DefinitionΒ 1.4.22 but which we will revisit soon in more detail, and Arnoldi iteration, which will produce an orthonormal basis for a Krylov subspace.

Subsection 3.3.1 Arnoldi iteration

Since we are interested in computing the eigenvalues of a matrix, we may remember that PropositionΒ 2.1.4 tells us that any complex matrix is similar to an upper triangular matrix. The diagonal entries in the upper triangular matrix would then give us the eigenvalues. This is also true for real matrices as long as we allow complex arithmetic.
Unfortunately, there is not a computationally feasible algorithm to produce an upper triangular matrix similar to \(A\) given our assumptions. However, we may relax the condition somewhat and consider Hessenberg matrices, which are matrices having the following, almost-upper triangular form:
\begin{equation*} H=\begin{bmatrix} * \amp * \amp * \amp \ldots \amp * \amp * \\ * \amp * \amp * \amp \ldots \amp * \amp * \\ 0 \amp * \amp * \amp \ldots \amp * \amp * \\ 0 \amp 0 \amp * \amp \ldots \amp * \amp * \\ \vdots \amp \vdots \amp \amp \ddots \amp * \amp * \\ 0 \amp 0 \amp 0 \amp \ldots \amp * \amp * \\ \end{bmatrix}\text{.} \end{equation*}
That is, a Hessenberg matrix is upper triangular except for nonzero entries that could live just below the diagonal. If we are given a square matrix \(A\text{,}\) it turns out there are several computationally feasible ways to find a Hessenberg matrix similar to \(A\text{.}\) One way that may be familiar involves Householder reflections.
However, we assume we are in the situation where we do not have access to the matrix and that we can multiply \(A\xvec\) for any vector \(\xvec\text{.}\) In language from the earlier part of our course, we know the operator defined by \(A\text{,}\) but not \(A\) itself. A technique known as Arnoldi iteration allows us to find a unitary matrix \(Q\) and a Hessenberg matrix \(H\) such that \(A=QHQ^*\text{.}\) If we work with real matrices, we find an orthogonal matrix \(Q\) with \(A=QHQ^T\text{.}\)
We have
\begin{equation*} Q = \begin{bmatrix} \qvec_1 \amp \qvec_2 \amp \ldots \amp \qvec_m \end{bmatrix} \end{equation*}
and
\begin{equation*} H=\begin{bmatrix} h_{11} \amp h_{12} \amp h_{13} \amp \ldots \amp h_{1,m-1} \amp h_{1m} \\ h_{21} \amp h_{22} \amp h_{23} \amp \ldots \amp h_{2,m-1} \amp h_{2m} \\ 0 \amp h_{32} \amp h_{33} \amp \ldots \amp h_{3,m-1} \amp h_{3m} \\ \vdots \amp \vdots \amp \amp \ddots \amp \vdots \amp \vdots \\ 0 \amp 0 \amp 0 \amp \ldots \amp h_{m-1,m} \amp h_{mm} \\ \end{bmatrix}\text{.} \end{equation*}
We begin by rewriting \(A=QHQ^T\) as \(AQ=QH\) so that
\begin{equation*} AQ=\begin{bmatrix} A\qvec_1\amp A\qvec_2 \amp \ldots \amp A\qvec_n \end{bmatrix}\text{.} \end{equation*}
This leads to the expressions
\begin{align*} A\qvec_1 \amp = h_{11}\qvec_1 + h_{21}\qvec_2\\ A\qvec_2 \amp = h_{12}\qvec_1 + h_{22}\qvec_2 + h_{32}\qvec_3\\ \vdots \amp = \vdots\\ A\qvec_n \amp = h_{n1}\qvec_1 + h_{n2}\qvec_2 + \ldots + h_{nn}\qvec_n + h_{n+1,n}\qvec_{n+1}\\ \vdots \amp = \vdots\\ A\qvec_m \amp = h_{m1}\qvec_1 + h_{m2}\qvec_2 + \ldots + h_{mm}\qvec_n\text{.} \end{align*}

Example 3.3.1.

Rather than developing a general explanation for finding \(Q\) and \(H\text{,}\) we will demonstrate with an example. Suppose that \(A=\begin{bmatrix} 1 \amp -2 \amp 3 \\ -2 \amp 4 \amp 2 \\ 3 \amp 2 \amp -1 \\ \end{bmatrix} \text{.}\) We choose any nonzero vector \(\vvec\) and define
\begin{equation*} \qvec_1 = \frac{1}{\len{\vvec}}\vvec\text{.} \end{equation*}
We will choose \(\vvec=\threevec100\) so that \(\qvec_1=\threevec100\text{.}\)
Our first expression is
\begin{equation*} A\qvec_1 = h_{11}\qvec_1 + h_{21}\qvec_2\text{.} \end{equation*}
Remember that the vectors \(\qvec_i\) will be the columns of an orthogonal matrix, which means that they form an orthonormal set. Therefore,
\begin{equation*} (A\qvec_1)\cdot\qvec_1 = h_{11}\qvec_1\cdot\qvec_1 + h_{21}\qvec_2\cdot\qvec_1 = h_{11}\text{.} \end{equation*}
That is, we find
\begin{equation*} h_{11}=\qvec_1\cdot(A\qvec_1) = r(A,\qvec_1)\text{,} \end{equation*}
so we see that \(h_{11}\) is a Rayleigh quotient, which is encouraging.
In our example, we have \(A\qvec_1 = \cthreevec1{-2}3\) so that \(h_{11} = \qvec_1\cdot(A\qvec_1) = 1\text{.}\)
Rearranging the expression for \(A\qvec_1\text{,}\) we have
\begin{equation*} h_{21}\qvec_2 = A\qvec_1 - h_{11}\qvec_1= \cthreevec0{-2}{3}\text{.} \end{equation*}
Because \(\qvec_2\) has unit length, we have
\begin{equation*} h_{21} = \len{A\qvec_1 - h_{11}\qvec_1} \end{equation*}
so that \(h_{21} = \sqrt{13}\) and then
\begin{equation*} \qvec_2 = \frac{1}{h_{21}}(A\qvec_1 - h_{11}\qvec_1) \end{equation*}
or \(\qvec_2 = \frac{1}{\sqrt{13}}\cthreevec0{-2}3\text{.}\)
We now repeat the same process using the second expression
\begin{equation*} A\qvec_2 = h_{12}\qvec_1+h_{22}\qvec_2+h_{32}\qvec_3\text{.} \end{equation*}
In particular,
\begin{equation*} h_{12}=(A\qvec_2)\cdot\qvec_1,\hspace{12pt} h_{22}=(A\qvec_2)\cdot\qvec_2=r(A,\qvec_2) \end{equation*}
so that, once again, \(h_{22}\) is a Rayleigh quotient.
In our example, \(A\qvec_2 = \frac{1}{\sqrt{13}}\cthreevec{13}{-2}{-7}\) and hence \(h_{12}=\sqrt{13}\) and \(h_{22}=-\frac{17}{13}\text{.}\)
Rearranging the expression for \(A\qvec_2\) shows that
\begin{equation*} h_{32}\qvec_3 = A\qvec_2-h_{12}\qvec_1-h_{22}\qvec_2= \cthreevec{0}{-\frac{60}{13\sqrt{13}}}{-\frac{40}{13\sqrt{13}}}\text{.} \end{equation*}
Therefore, \(h_{32}=\frac{20}{13}\) and \(\qvec_3=\cthreevec{0}{-\frac{3}{\sqrt{13}}}{-\frac{2}{\sqrt{13}}}\text{.}\)
Notice that \(\qvec_1\text{,}\) \(\qvec_2\text{,}\) and \(\qvec_3\) form a basis for \(\real^3\) so we have all three columns for the matrix \(Q\text{.}\) Our final expression is
\begin{equation*} A\qvec_3=h_{13}\qvec_1 + h_{23}\qvec_2 + h_{33}\qvec_3\text{,} \end{equation*}
which gives
\begin{align*} h_{13}\amp=(A\qvec_3)\cdot\qvec_1 =0\\ h_{23}\amp=(A\qvec_3)\cdot\qvec_2 =\frac{20}{13}\\ h_{33}\amp=(A\qvec_3)\cdot\qvec_3 =\frac{56}{13}\text{.} \end{align*}
This leaves us with
\begin{equation*} Q = \begin{bmatrix} 1 \amp 0 \amp 0 \\ 0 \amp -\frac{2}{\sqrt{13}} \amp -\frac{3}{\sqrt{13}} \\ 0 \amp \frac{3}{\sqrt{13}} \amp -\frac{2}{\sqrt{13}} \\ \end{bmatrix}, \hspace{12pt} H = \begin{bmatrix} 1 \amp \sqrt{13} \amp 0 \\ \sqrt{13} \amp -\frac{17}{13} \amp \frac{20}{13} \\ 0 \amp \frac{20}{13} \amp \frac{56}{13} \\ \end{bmatrix} \end{equation*}
from which we can check that \(A=QHQ^T\text{.}\)
Notice that our original matrix \(A=\begin{bmatrix} 1 \amp -2 \amp 3 \\ -2 \amp 4 \amp 2 \\ 3 \amp 2 \amp -1 \\ \end{bmatrix}\) is symmetric. Since \(H=Q^TAQ\text{,}\) we have
\begin{equation*} H^T = (Q^TAQ)^T = Q^TA^TQ = Q^TAQ = H \end{equation*}
showing that \(H\) is symmetric too. Since \(H\) is also Hessenberg, this explains why \(H\) is tridiagonal, which is seen by the presence of the \(0\) in the first row, third column of \(H\text{.}\)

Subsection 3.3.2 Krylov subspaces

We will now explore how Arnoldi iteration can approximate the largest eigenvalues of an \(m\times m\) matrix when \(m\) is very large. To begin, we recall that the power method, our first technique for approximating eigenvalues, begins with a randomly chosen vector \(\vvec\) and repeatedly multiplies by \(A\) to compute \(A^k\vvec\text{.}\)
One feature of the power method is that it is β€œforgetful.” That is, we compute lots of vectors \(A^k\vvec\text{,}\) but at every step we only use the last one. Perhaps we could do better if we remembered all the vectors. This is the motivation for considering a Krylov subspace:
\begin{equation*} \kcal_n(A,\vvec) = \laspan{\vvec, A\vvec,\ldots,A^{n-1}\vvec} \end{equation*}
and its associated Krylov matrix
\begin{equation*} K_n = \begin{bmatrix} \vvec \amp A\vvec \amp \ldots \amp A^{n-1}\vvec \end{bmatrix}\text{.} \end{equation*}
You may remember that we introduced Krylov subspaces when we constructed the minimal polynomial of a vector \(\vvec\text{.}\) Notice that we have the inclusion of subspaces
\begin{equation} \kcal_1 \subset \kcal _2 \subset \ldots \subset \kcal_n\tag{3.3.1} \end{equation}
and that these begin as proper inclusions \(\kcal_j\subsetneq\kcal_{j+1}\) until the vectors
\begin{equation*} \vvec, A\vvec, \ldots, A^\nu\vvec \end{equation*}
become linearly dependent. We said that \(\nu\) is the grade of \(\vvec\) and noted that a non-trivial linear relationship between these vectors led to the minimal polynomial \(p_\vvec\text{.}\)
Remember that \(p_\vvec\) divides the minimal polynomial \(p\) of \(A\text{.}\) If \(\deg(p_\vvec) \lt \deg(p)\text{,}\) then \(\vvec\) is in \(\nul(p_\vvec(A))\text{,}\) a proper subspace of \(\real^m\text{.}\) The chances of choosing a vector in a given proper subspace is practically zero so we generally do not expect this to happen. In other words, we expect the chain of proper inclusions to continue for a long time so we assume that
\begin{equation*} \dim\kcal_n = \dim(\col(K_n) = n\text{.} \end{equation*}

Proof.

Recall that
\begin{equation*} A\qvec_j = h_{1j}\qvec_1 + h_{2j}\qvec_2 + \ldots + h_{j+1,j}\qvec_{j+1}\text{,} \end{equation*}
which implies that
\begin{equation} A\qvec_j \in \laspan{\basis{\qvec}{j+1}}\text{.}\tag{3.3.2} \end{equation}
Notice that \(A^0\vvec = \vvec = \len{\vvec}~\qvec_1\) so that
\begin{equation*} A^0\vvec \in \laspan{\qvec_1}\text{.} \end{equation*}
Now suppose that
\begin{equation*} A^{j-1}\vvec \in \laspan{\basis{\qvec}{j}}\text{.} \end{equation*}
It then follows that
\begin{equation*} A^j\vvec \in \laspan{\basis{A\qvec}{j}} = \laspan{\basis{\qvec}{j+1}} \end{equation*}
where the equality of spans follows from (3.3.2). This shows, by induction, that
\begin{equation*} A^{j-1}\vvec \in \laspan{\basis{\qvec}{j}} \end{equation*}
for all \(j\text{.}\)
Since \(A^{j-1}\vvec\) is the \(j^{th}\) column of \(K_n\) and is a linear combination of the first \(j\) columns of \(Q_n\text{,}\) it follows that \(K_n=Q_nR\) where \(R\) is upper triangular.
This proposition shows that Arnoldi iteration creates a \(QR\) factorization of \(K_n\) without explicitly constructing \(R\text{.}\) This is good fortune indeed!
This analysis also provides some insight into a possible issue when running Arnoldi iteration. Remembering that
\begin{equation*} A\qvec_j = h_{1j}\qvec_1 + h_{2j}\qvec_2 + \ldots + h_{j+1,j}\qvec_{j+1}\text{,} \end{equation*}
the algorithm could break down if \(A\qvec_j\in\laspan{\basis{\qvec}{j}}\text{.}\) When this is the case, \(h_{j+1,j}\qvec_{j+1} = \zerovec\) and hence \(\qvec_{j+1}\) is not defined.
This can only happen, however, if \(A^j\vvec\) is also in
\begin{equation*} \laspan{\basis{\qvec}{j}} = \laspan{\vvec,A\vvec,\ldots,A^{j-1}\vvec}\text{,} \end{equation*}
which we have seen means that \(j=\nu\text{,}\) the grade of \(\vvec\text{,}\) and that \(p_\vvec\text{,}\) the minimal polynomial of \(\vvec\text{,}\) is a proper factor of the minimal polynomial of \(p\text{.}\) Since there are finitely many proper factors of \(p\text{,}\) the chance that we randomly choose \(\vvec\) in the null space of one of those factors is exceedingly rare.

Subsection 3.3.3 Eigenvalues and Arnoldi iteration

We introduced Arnoldi iteration as a way of finding a Hessenberg matrix \(H\) similar to \(A\text{.}\) Remember, however, that we would like to work with very large \(m\times m\) matrices for which it would not be practical to run the algorithm to completion. Instead, we remember that the power method tells us that the Krylov subspaces contain information about the largest eigenvalues of \(A\text{.}\) Also, Arnoldi iteration is a process that, after each step, produces \(Q_n\) an orthonormal basis for the Krylov subspace \(\kcal_n\) where \(n\) is usually much smaller than \(m\text{,}\) the size of the matrix.
We will slightly reformulate the algebraic framework for Arnoldi iteration to focus on intermediate steps without expecting that it runs to completion. Once again, we will have
\begin{equation*} Q_n = \begin{bmatrix} \qvec_1 \amp \qvec_2 \amp \ldots \amp \qvec_n \end{bmatrix}\text{,} \end{equation*}
which is an \(m\times n\) matrix with \(n\leq m\text{.}\) Multiplying the columns of \(Q\) by \(A\) gives the relations that we saw earlier:
\begin{align*} A\qvec_1 \amp = h_{11}\qvec_1 + h_{21}\qvec_2\\ A\qvec_2 \amp = h_{12}\qvec_1 + h_{22}\qvec_2 + h_{32}\qvec_3\\ \vdots \amp = \vdots\\ A\qvec_n \amp = h_{1n}\qvec_1 + h_{2n}\qvec_2 + \ldots + h_{nn}\qvec_n + h_{n+1,n}\qvec_{n+1}\text{.} \end{align*}
Notice the term \(h_{n+1,n}\qvec_{n+1}\) in the expression for \(A\qvec_n\text{.}\) Putting all the coefficients \(h_{ij}\) into a matrix results in the matrix
\begin{equation*} \widetilde{H}_n = \begin{bmatrix} h_{11} \amp h_{12} \amp h_{13} \amp \ldots \amp h_{1n} \\ h_{21} \amp h_{22} \amp h_{23} \amp \ldots \amp h_{2n} \\ 0 \amp h_{32} \amp h_{33} \amp \ldots \amp h_{3n} \\ 0 \amp 0 \amp h_{43} \amp \ldots \amp h_{4n} \\ \vdots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp 0 \amp 0 \amp \ldots \amp h_{n+1,n} \\ \end{bmatrix}\text{.} \end{equation*}
Notice that the shape of \(\widetilde{H}_n\) is \((n+1)\times n\) so it is not square. We do, however, have
\begin{equation*} AQ_n = Q_{n+1}\widetilde{H}_n\text{.} \end{equation*}
Here are some straightforward observations.
  • Since the columns of \(Q_n\) are orthonormal, we have \(Q_n^TQ_n = I_n\text{.}\)
  • \(Q_{n+1} = \begin{bmatrix} Q_n \amp \qvec_{n+1} \end{bmatrix}\) so \(Q_n^TQ_{n+1} = \begin{bmatrix} I_n \amp \zerovec\end{bmatrix}\text{.}\) That is, \(Q_n^TQ_{n+1}\) is an \(n\times (n+1)\) matrix with a column of zeros appended to the side of the \(n\times n\) identity matrix.
  • If we define \(H_n = Q_n^TQ_{n+1}\widetilde{H}_n\text{,}\) then \(H_n\) is a square \(n\times n\) matrix obtained from \(\widetilde{H}_n\) by removing its last row.
  • Because \(Q_{n+1}\widetilde{H}_n = AQ_n\text{,}\) it also follows that
    \begin{equation*} H_n = Q_n^TAQ_n\text{.} \end{equation*}
    While this may appear to say that \(H_n\) is similar to \(A\text{,}\) remember than \(Q_n\) is not square so this is not a similarity relationship. It does suggest, however, that \(H_n\) is β€œsomewhat similar” to \(A\) so that they may share some common features.
Our goal now is to work with an \(m\times m\) matrix \(A\text{,}\) which is possibly very large, and approximate its largest eigenvalues. The power method tells us that the Krylov subspace \(\kcal_n\) can be expected to have information about the largest eigenvalues of \(A\text{.}\) For this reason, we will use the Krylov subspace to create an approximation of \(A\text{.}\)
The Krylov subspace \(\kcal_n\) is an \(n\)-dimensional subspace of \(\real^n\) so we can consider the orthogonal projection \(P_{\kcal_n}:\real^m \to \kcal_n\text{.}\) This leads us to form the approximating operator \(T_n:\kcal_n\to\kcal_n\) by
\begin{equation*} T_n(\xvec) = P_{\kcal n}A\xvec\text{.} \end{equation*}
This is, \(T_n\) multiplies a vector in \(\kcal_n\) by \(A\) and then projects it back into \(\kcal_n\text{.}\)
Notice that \(A\) defines an operator on \(\real^m\) where \(m\) is huge, while \(T_n\) defines an operator on an \(n\)-dimensional space where \(n\) is possibly much smaller than \(m\text{.}\) Therefore, the operator \(T_n\) is an approximation to \(A\) on a much lower dimensional subspace \(\kcal_n\) that has important information about the largest eigenvalues of \(A\text{.}\)

Proof.

Since \(\basis{\qvec}{n}\) is an orthonormal basis for \(\kcal_n\text{,}\) the orthogonal projection \(P_{\kcal_n}\) is given by \(Q_nQ_n^T\text{,}\) which says that
\begin{equation*} T_n\xvec = Q_nQ_n^TA\xvec\text{.} \end{equation*}
Now the change of basis relationships tell us that \(\xvec=Q_n\coords{\xvec}{\bcal}\) and \(\coords{T_n\xvec}{\bcal} = Q_n^{-1}T_n\xvec = Q_n^TT_n\xvec\text{.}\) Putting this together, we have
\begin{align*} \coords{T_n\xvec}{\bcal} \amp = Q_n^TT_n\xvec\\ \amp = Q_n^TQ_nQ_n^TA\xvec\\ \amp = Q_n^TAQ_n\coords{\xvec}{\bcal}\\ \amp = H_n\coords{\xvec}{\bcal}\text{.} \end{align*}
This remarkable turn of events shows us that Arnoldi iteration provides the perfect tool for approximating the largest eigenvalues of \(A\text{.}\) The operator \(T_n\) approximates \(A\) on a subspace with information about the largest eigenvalues and the matrix \(H_n\text{,}\) which is formed by Arnoldi iteration, represents the operator \(T_n\text{.}\) This leads us to suspect that
The eigenvalues of \(H_n\) approximate the largest eigenvalues of \(A\text{.}\)
The eigenvalues of \(H_n\) are called Ritz values. Since \(H_n\) is an \(n\times n\) matrix and \(n\) is much smaller than \(m\text{,}\) we can use a conventional technique, such as the QR algorithm to find the Ritz values.

Subsection 3.3.4 A working example

We will demonstrate these ideas computationally. We will first construct a \(100\times 100\) matrix \(A\) and plot its eigenvalues. Of course, this is not a particularly large matrix, certainly not of the size we have imagined in this discussion, but it is big enough to demonstrate the central ideas.
Evaluate the following cell and you will see a plot of \(100\) eigenvalues in the complex plane. For the most part, the eigenvalues are spread uniformly throughout a disk though there are a couple of eigenvalues of relatively large magnitude along the positive real axis.
Evaluating the next cell performs 6 steps of Arnoldi iteration to produce a \(6\times 6\) matrix \(H_n\text{,}\) which is displayed to demonstrate its Hessenberg form. (This cell and the next assume that the previous cell has been recently evaluated.)
Next we plot the eigenvalues of \(H_n\text{,}\) which we call the Ritz values, and compare them to the eigenvalues of \(A\text{.}\) We begin with a small value of \(n=3\text{,}\) but you should experiment by increasing the value of \(n\) and noting how the Ritz values approximate the largest eigenvalues. Note what happens when \(n=10,20,50\text{,}\) for instance.
To restate the main idea, we assume that \(A\) is so large that it is not possible to directly compute its eigenvalues. Instead, we approximate \(A\) by \(H_n\) where \(n\) is relatively small so that its eigenvalues can be computed through traditional means. We see that the eigenvalues of \(H_n\) approximate the large eigenvalues of \(A\) and that the accuracy of the approximations improve as \(n\) grows.

Subsection 3.3.5 GMRES

Besides finding the eigenvalues of a matrix, the other main task in numerical linear algebra is solving a linear system \(A\xvec=\bvec\text{.}\) Once again, we imagine that \(A\) is very large so that Gaussian elimination, or some variant, is not computationally feasible. We will see how our understanding of Krylov subspaces and Arnoldi iteration provides an effective tool called GMRES or β€œgeneralized minimal residuals.”
To approximate the solution to \(A\xvec=\bvec\text{,}\) we construct the Krylov matrix
\begin{equation*} K_n = \begin{bmatrix} \bvec \amp A\vvec \amp A^2\bvec\amp\ldots\amp A^{n-1}\bvec \end{bmatrix} \end{equation*}
and the Krylov subspace \(\kcal_n=\col(K_n)\text{.}\) Once again, \(\kcal_n\) contains information about the largest eigenvalues of \(A\text{,}\) which are usually the most important. The idea of GMRES is to approximate \(\xvec^*\text{,}\) the solution to the equation \(A\xvec=\bvec\text{,}\) by \(\xvec_n\text{,}\) the vector in \(\kcal_n\) that minimizes \(\len{A\xvec - \bvec}\text{.}\) This is now a least squares problem.
Arnoldi iteration begins with
\begin{equation} \qvec_1 = \frac{1}{\len{\bvec}}\bvec\tag{3.3.3} \end{equation}
and constructs \(Q_n\) and \(\widetilde{H}_n\) so that
\begin{equation*} AQ_n=Q_{n+1}\widetilde{H}_n\text{.} \end{equation*}
Remember that \(Q_n\) is an orthonormal basis for \(\kcal_n\) so that we can write
\begin{equation*} \xvec_n = Q_n\yvec_n\text{.} \end{equation*}
It is worth noting the dimensions of these quantities: \(Q_n\) is an \(m\times n\) matrix, which means that \(\xvec_n\) is in \(\real^m\text{,}\) as expected, and \(\yvec_n\) is in \(\real^n\text{.}\) In other words, \(\yvec_n\) is a low-dimensional vector.
We make two observations:
  • We recast the first step in the iteration, \(\qvec_1 = \frac{1}{\len{\bvec}}\bvec\text{,}\) so that
    \begin{equation*} \bvec = \len{\bvec}~\qvec_1 = \len{\bvec}~Q_{n+1}\evec_1 \end{equation*}
    where \(\evec_1\) is the first \((n+1)\)-dimensional standard basis vector.
  • Since the columns of \(Q_{n+1}\) are orthonormal, we have the familiar fact:
    \begin{equation*} \len{Q_{n+1}\zvec}^2 = (Q_{n+1}\zvec)\cdot(Q_{n+1}\zvec) = \zvec\cdot (Q_{n+1}^TQ_{n+1})\zvec = \zvec\cdot\zvec = \len{\zvec}^2\text{.} \end{equation*}
    That is, multiplying by \(Q_{n+1}\) does not change the length of a vector.
We put these facts to use by noting that we want to find \(\yvec_n\) that minimizes
\begin{align*} \len{A\xvec-\bvec} \amp = \len{AQ_n\yvec - \bvec}\\ \amp = \len{Q_{n+1}\widetilde{H}_n\yvec - \len{\bvec}~Q_{n+1}\evec_1}\\ \amp = \len{\widetilde{H}_n\yvec -\len{\bvec}~\evec_1}\text{.} \end{align*}
To summarize, our task is to find \(\yvec_n\) that minimizes
\begin{equation*} \len{\widetilde{H}_n\yvec -\len{\bvec}~\evec_1}\text{.} \end{equation*}
Recall that \(\widetilde{H}_n\) is an \((n+1)\times n\) matrix and \(\evec_1\) is in \(\real^{n+1}\text{.}\) This is a tractable least squares problem since \(n\) is assumed to be small. In particular, we can find a \(QR\) factorization of \(\widetilde{H}_n\) so that \(\widetilde{H}_n = QR\) and then
\begin{equation*} \yvec_n = R^{-1}Q^T\left(\len{b}~\evec_1\right)\text{.} \end{equation*}
(Of course, the \(Q\) in this expression is not related to \(Q_n\text{,}\) which is obtained from Arnoldi iteration.) Once we have \(\yvec_n\text{,}\) we find \(\xvec_n=Q_{n}\yvec_n\text{,}\) our approximation to \(\xvec^*\text{,}\) the solution to the equation \(A\xvec=\bvec\text{.}\)
The following cell will form a random \(200\times 200\) matrix \(A\text{,}\) a random vector \(\xvec^*\text{,}\) and \(b=A\xvec^*\text{.}\) This means that \(\xvec^*\) is the solution that we want to approximate.
Now we perform GMRES for \(n=1,2,\ldots,20\text{.}\) For each \(n\text{,}\) we have an approximate solution \(\xvec_n\) so we observe the error \(\len{\xvec^*-\xvec_n}\text{.}\)
In the final step, we are approximating the solution to a \(200\times 200\) system with the solution to a \(21\times 20\) least squares problem. As you can see, the approximations are excellent.
These Sage cells implement a simplistic version of GMRES in that it begins Arnoldi iteration from the beginning for each value of \(n\text{.}\) A better implementation would reuse the results of the previous iteration so that only one step of Arnoldi would be needed.
One possible problem with Arnoldi iteration, and therefore with GMRES, is that we need to store all of the previous vectors \(\qvec_j\) and these are \(m\)-dimensional vectors. This means that we may run out of memory before we have found a suitably accurate approximation. For this reason, there are several strategies for β€œrestarting” Arnoldi once we reach that point.