Skip to main content

Section 3.2 QR methods

In this section, our goal is to improve the power method so that we compute more eigenvalues of \(A\text{.}\) For convenience, we will focus on real, symmetric matrices for which the Spectral Theorem tells us that there is an othonormal basis for \(\real^m\) consisting of eigenvectors of the \(m\times m\) matrix \(A\text{.}\) While many of the ideas we will discuss apply more generally, this assumption will simplify some of our thinking.
We began this chapter emphasizing the fact that we want to study very large, possibly sparse matrices. We will return to that in later sections, but we will set aside this focus for the time being.

Subsection 3.2.1 Using orthogonality

In the last section, we explored the power method, how it leads to eigenvalue estimates, and some of its drawbacks. In particular, we saw that the power method, and its variants, focus on a single eigenvalue by looking at powers \(A^k\vvec\) of a single vector. Perhaps we could do better by applying the power method to a collection of vectors \(\basis{\wvec}{n}\text{;}\) that is, we will start with an \(m\times n\) matrix \(W=\begin{bmatrix} \wvec_1\amp\wvec_2\amp\ldots\amp\wvec_n \end{bmatrix}\text{,}\) whose columns are linearly independent.
If we repeatedly multiply \(W\) by \(A\text{,}\) we obtain the matrix
\begin{equation*} A^kW=\begin{bmatrix} A^k\wvec_1 \amp A^k\wvec_2 \amp \ldots \amp A^k\wvec_n \end{bmatrix}\text{.} \end{equation*}
Assuming \(A\) is nonsingular, \(\col(A^kW)\) will be an \(n\)-dimensional subspace of \(\real^m\text{.}\)
For the moment, we focus on the first two vectors \(\wvec_1\) and \(\wvec_2\) and write
\begin{align*} \wvec_1 \amp = c_1\vvec_1+c_2\vvec_2+\ldots+c_m\vvec_m\\ \wvec_2 \amp = b_1\vvec_1+b_2\vvec_2+\ldots+b_m\vvec_m \end{align*}
in terms of eigenvectors \(\basis{\vvec}{m}\text{.}\) Then
\begin{align*} A^k\wvec_1 \amp = \lambda_1^k \left(c_1\vvec_1 + c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\vvec_2+ \ldots c_m\left(\frac{\lambda_m}{\lambda_1}\right)^k\vvec_m \right)\\ A^k\wvec_2 \amp = \lambda_1^k \left(b_1\vvec_1 + b_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\vvec_2+ \ldots b_m\left(\frac{\lambda_m}{\lambda_1}\right)^k\vvec_m \right)\text{.} \end{align*}
The span of these two vectors will be a \(2\)-dimensional subspace and the most significant terms involve \(\vvec_1\) and \(\vvec_2\text{.}\) This means that \(\laspan{A^k\wvec_1,A^k\wvec_2} \approx \laspan{\vvec_1,\vvec_2}\text{.}\)
Also, the ordinary power method tells us to expect that \(A^k\wvec_1\) approximately points in the direction \(\vvec_1\text{.}\) Due to our assumption that \(A\) is symmetric, we know that \(\vvec_2\) is orthogonal to \(\vvec_1\text{.}\) Therefore, to find the approximate direction of \(\vvec_2\text{,}\) we can look for a vector in \(\laspan{A^k\wvec_1,A^k\wvec_2}\) that is orthogonal to \(A^k\wvec_1\approx c\vvec_1\text{.}\) Such a vector can be found using Gram-Schmidt orthogonalization.

Example 3.2.1.

We will investigate using the matrix \(A=\begin{bmatrix} 1 \amp 2 \\ 2 \amp 1 \\ \end{bmatrix}\text{,}\) whose eigenvalues we already know to be \(3\) and \(-1\text{.}\) We will choose \(\wvec_1=\twovec10\) and \(\wvec_2=\twovec01\) so that \(W=I\) and hence \(A^kW = A^k\text{.}\) Our notation is that
\begin{equation*} A^kW = \begin{bmatrix} \zvec_1 \amp \zvec_2 \end{bmatrix} \end{equation*}
so that \(\zvec_1\) approximately points in the direction of \(\vvec_1\text{,}\) We find a vector orthogonal to \(\zvec_1\) using Gram-Schmidt so that we obtain
\begin{align*} \yvec_1 \amp = \zvec_1\\ \yvec_2 \amp = \zvec_2 - \frac{\zvec_2\cdot\yvec_1} {\yvec_1\cdot\yvec_1}\yvec_1\text{.} \end{align*}
We expect that \(\yvec_1\) will be a good approximation to \(\vvec_1\) and \(\yvec_2\) will be a good approximation to \(\vvec_2\text{.}\) The following Sage cell implements these computations using \(A^{100}\text{.}\)
Clearly, this idea fails here since we find that \(\yvec_2\text{,}\) whose direction should approximate \(\vvec_2\text{,}\) is zero. To see why, notice that the vectors \(\zvec_1\) and \(\zvec_2\) appear to be the same, at least as computed by Sage.
The problem is that the components of these vectors are so large (approximately \(10^{47}\)) that the floating point computations cannot detect the difference in these vectors. For this reason, we need to modify this idea.

Subsection 3.2.2 The QR Method

Instead of computing \(A^kW\) for a very large \(k\) and then finding an orthogonal basis, we will multiply once, finding \(AW\text{,}\) determine an orthogonal basis for the column space of \(AW\text{,}\) and then repeat.
Remember that the Gram-Schmidt algorithm is described by a \(QR\) factorization. That is, if we write \(AW=QR\text{,}\) then \(Q\) is a matrix whose columns are orthonormal and \(R\) is upper triangular. In particular, if \(Q=\begin{bmatrix} \qvec_1 \amp \qvec_2 \amp \ldots \qvec_m \end{bmatrix} \text{,}\) then \(\qvec_1 = A\wvec_1\text{,}\) which has more information about the first eigenvector \(\vvec_1\text{.}\) The second column \(\qvec_2\) is orthgonal to \(\qvec_1\) and should be expected to have more information about the second eigenvector \(\vvec_2\text{.}\) Our hope is that
\begin{equation*} Q = \begin{bmatrix} \qvec_1\amp \amp \qvec_2 \amp \ldots \qvec_m \end{bmatrix} \approx V = \begin{bmatrix} \vvec_1\amp \amp \vvec_2 \amp \ldots \vvec_m \end{bmatrix} \end{equation*}
for some appropriate set of eigenvectors \(\basis{\vvec}{m}\text{.}\) That is, after one step, the columns of \(Q\) more closely align with a set of eigenvectors than do the columns of our original matrix \(W\text{.}\) We then replace \(W\) by \(Q\) and repeat.
This leads to the following algorithm:
  1. Choose \(W=\begin{bmatrix} \wvec_1\amp\ldots\wvec_m \end{bmatrix}\) randomly.
  2. Repeat a sufficient number of times:
    1. Find \(B = AW\)
    2. Find a \(QR\) factorization \(B=QR\text{.}\)
    3. Replace \(W\leftarrow Q\text{.}\)
As \(W=Q\) begins to approach an orthogonal basis of eigenvectors, then \(QAQ^T\) should start to approach a diagonal matrix whose entries are the eigenvalues of \(A\text{.}\)
The following Sage cell demonstrates with the previous matrix \(A\text{.}\)
This looks promising. We see that \(D=QAQ^T\) approaches a diagonal matrix whose entries are the eigenvalues of \(A\) and \(Q\) begins to approach an orthonormal set of eigenvectors.
If we are only interested in finding eigenvalues, we can streamline this method further. If we take \(W=I\text{,}\) then the \(QR\) factorization \(A=QR\) produces an orthogonal matrix \(Q\) so that we can form \(A' = Q^TAQ\text{,}\) which is similar to \(A\text{.}\) Because similar matrices share the same set of eigenvalues, we can continue working with \(A'\) rather than \(A\text{.}\) As we iterate, the columns of \(Q\) still approach an orthogonal set of eigenvectors and hence \(Q^TAQ\) should approach a diagonal matrix containing the eigenvalues.
But something remarkable happens. If \(A = QR\text{,}\) then
\begin{equation*} A' = Q^TAQ = Q^T(QR)Q = (Q^TQ)RQ = RQ\text{.} \end{equation*}
That is, \(A = QR\) and \(A'=RQ\text{,}\) which means we find the new matrix \(A'\) by simply reversing the order of the \(QR\) product. Therefore, our algorithm to find the eigenvalues of \(A\) takes a simple form, which is known as the \(QR\) algorithm:
  1. Repeat a sufficient number of times:
    1. Find a \(QR\) factorization \(A=QR\text{.}\)
    2. Replace \(A\leftarrow RQ\text{.}\)
It’s hard to imagine a simpler algorithm. Let’s see how it does.
Notice how the matrix approaches a diagonal matrix with the eigenvalues on the diagonal!

Example 3.2.2.

Let’s try with a larger symmetric matrix, \(A=\begin{bmatrix} 2 \amp 5 \amp -1 \amp -1 \\ 5 \amp -2 \amp 3 \amp -5 \\ -1 \amp 3 \amp 4 \amp 1 \\ -1 \amp -5 \amp 1 \amp 4 \\ \end{bmatrix} \text{.}\) We will iterate several times, and then examine the hopefully almost diagonal matrix \(A\text{.}\) Notice that the actual eigenvalues of \(A\) appear at the top of the cell’s output.
After 50 iterations, the \(QR\) algorithm has made good progress toward finding the eigenvalues
.

Subsection 3.2.3 The shifted QR algorithm

The \(QR\) algorithm is based on the power method, which we began by attempting to find the dominant eigenvalue \(\lambda_1\text{.}\) With a little bit of algebra, however, we will see that there is even more going on.
Let’s first consider the matrix \(P = \begin{bmatrix} 0 \amp 0 \amp \ldots \amp 1 \\ \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp 1 \amp \ldots \amp 0 \\ 1 \amp 0 \amp \ldots \amp 0 \\ \end{bmatrix} \text{,}\) which is an example of what is called a permutation matrix. For instance, if \(B=\begin{bmatrix}\bvec_1\amp\bvec_2\amp \ldots \amp\bvec_m \end{bmatrix}\text{,}\) then \(BP = \begin{bmatrix} \bvec_m\amp \ldots \amp \bvec_2\amp\bvec_1 \end{bmatrix}\) showing that multiplying by \(P\) on the right reverses the columns of \(B\text{.}\) In particular, this shows that \(P^2 = I\) so that \(P\) is its own inverse. Similarly, multiplying a matrix \(C\) on the left by \(P\) reverses the rows of \(C\text{.}\)
This means that if \(Q=\begin{bmatrix} \qvec_1\amp\qvec_2\amp\ldots\amp\qvec_m \end{bmatrix}\text{,}\) then \(QP = \begin{bmatrix} \qvec_m\amp\ldots\amp\qvec_2\amp\qvec_1 \end{bmatrix}\text{.}\) Once again, the columns of \(QP\) are the same as the columns of \(Q\text{,}\) just reversed, which means that \(QP\) is also an orthogonal matrix.
Since we are assuming that \(A\) is symmetric, we know that \(A=A^T\text{.}\) Taking the inverses of both sides says that \(A^{-1}={A^{-1}}^T\) so that \(A^{-1}\) is also symmetric. Now if we have a \(QR\) factorization of \(A\text{,}\) we have \(A=QR\) and hence
\begin{equation*} A^{-1}={A^{-1}}^T = ((QR)^{-1})^T = (R^{-1}Q^{-1})^T = QR^{-T} \end{equation*}
where \(R^{-T} = (R^{-1})^T\text{.}\) Since \(P^2=I\text{,}\) it then follows that
\begin{equation*} A^{-1}P = QR^{-T}P = Q(P^2)R^{-T}P = (QP)(PR^{-T}P)\text{.} \end{equation*}
Remember that \(R\) is upper triangular, which means that \(R^{-T}\) is lower triangular. However, \(PR^{-T}P\) is obtained from \(R^{-T}\) by reversing the columns and reversing the rows, which means that \(PR^{-T}P\) is upper triangular. Since \(QP\) is orthogonal, this shows that
\begin{equation*} A^{-1}P = (QP)(PR^{-T}P) \end{equation*}
is a \(QR\) factorization of of \(A^{-1}P\text{.}\)
Now that we’ve been through this algebraic digression, we see the following implication. When we implement the \(QR\) algorithm on \(A\text{,}\) the last column of \(Q\) is the first column of \(QP\text{,}\) which we obtain by implementing the \(QR\) algorithm on \(A^{-1}P\text{.}\) This means that while the \(QR\) algorithm is applying the power method to the first vector, it is applying the inverse power method to the last vector at the same time.
In other words, we developed the \(QR\) algorithm based on the power method, which finds the dominant eigenvalue associated to the first column of \(Q\text{.}\) It turns out, however, that the \(QR\) algorithm is also simultaenously performing the inverse power method on the last column of \(Q\text{.}\) This is an unexpected bonus that we can exploit to speed up the algorithm!
Suppose that the columns of \(Q\) are denoted by \(\qvec_i\text{.}\) Also suppose that \(\evec_m\) is the last standard basis vector, that is, the vector whose components are all zero except for the last component, which is \(1\text{.}\) We have \(\qvec_m = Q\evec_m\text{.}\) Therefore, the Rayleigh quotient
\begin{align*} r(\qvec_m, A\qvec_m) \amp = \frac{\qvec_m\cdot(A\qvec_m)}{\qvec_m\cdot\qvec_m}\\ \amp = (Q\evec_m)\cdot(AQ\evec_m)\\ \amp = \evec_m\cdot(Q^TAQ)\evec_m\\ \amp = (Q^TAQ)_{mm}\text{,} \end{align*}
the entry in the lower-right corner of the matrix that results from one step of the iteration. This shows that, if \(\qvec_m\) is an approximate eigenvector, then the new \(A_{mm}\) will approximately be an eigenvalue.
This suggests a change in tactics. Since the \(QR\) algorithm is performing the inverse power method on the last column of the matrix, we can shift by \(A_{mm}\text{,}\) an approximation to the eigenvalue associated to the approximate eigenvector \(\qvec_m\text{,}\) so that we are performing the shifted inverse power method. This should converge quickly if we continually update our shift by \(A_{mm}\text{,}\) our approximate eigenvalue. Here is our new algorithm:
  1. Repeat a sufficient number of times:
    1. Let \(\sigma = A_{mm}\)
    2. Find a \(QR\) factorization \(A-\sigma I=QR\text{.}\)
    3. Replace \(A\leftarrow RQ+\sigma I\text{.}\)
This is known as the shifted \(QR\) algorithm.

Example 3.2.3.

Let’s reconsider the matrix, \(A=\begin{bmatrix} 2 \amp 5 \amp -1 \amp -1 \\ 5 \amp -2 \amp 3 \amp -5 \\ -1 \amp 3 \amp 4 \amp 1 \\ -1 \amp -5 \amp 1 \amp 4 \\ \end{bmatrix} \text{,}\) that has the eigenvalue 4.893138821026808. The following Sage cell implements the shifted \(QR\) algorithm.
Notice that with only seven iterations, we have found the eigenvalue accurate to within 15 digits, which is about as good as we’re going to get with a computer. Moreover, the final matrix \(A\) essentially has only one zero in the last row and last column. In other words, it approximately looks like the block matrix:
\begin{equation*} A = \begin{bmatrix} A' \amp 0 \\ 0 \amp \lambda \\ \end{bmatrix}\text{.} \end{equation*}
This observation will be important in our final improvement.
We would like to find all of the eigenvalues of \(A\) so there is yet one more improvement we can make. First, we will see in the next section how some preliminary work produces a matrix that is similar to \(A\) and that is tridiagonal, meaning that the only nonzero elements are on the diagonal, or just above or below the diagonal. For instance,
\begin{equation*} \begin{bmatrix} 5 \amp 9 \amp 0 \amp 0 \\ 9 \amp 8 \amp 5 \amp 0 \\ 0 \amp 5 \amp 3 \amp 4 \\ 0 \amp 0 \amp 4 \amp 5 \\ \end{bmatrix} \end{equation*}
is a symmetric, tridiagonal matrix with which we can begin the new algorithm.
We will perform enough steps of the shifted \(QR\) algorithm until we reach the point that the final matrix
\begin{equation*} A = \begin{bmatrix} A' \amp 0 \\ 0 \amp \lambda \end{bmatrix}\text{.} \end{equation*}
Practically speaking, we iterate until the element in the last row and next to last column is sufficiently small. We then make note of the eigenvalue \(\lambda\approx A_{mm}\) and proceed to apply the shifted \(QR\) algorithm to \(A'\text{.}\)
Notice that we have computed all four eigenvalues of \(A\) to almost within the accuracy of the computer with a total of only 10 iterations. This is truly a remarkable algorithm!
The shifted \(QR\) algorithm is the state of the art for finding eigenvalues of symmetric matrices. While there are newer algorithms that rival it, shifted \(QR\) is simple, numerically stable, and in wide use today. The Association of Computing Machinery named it one of the top ten algorithms of the \(20^{th}\) century.
 1 
www.computer.org/csdl/magazine/cs/2000/01/c1022/13rRUxBJhBm