Lanczos iteration

Section 3.4 Lanczos iteration

The last section showed us that Arnoldi iteration can be an effective technique for finding the large eigenvalues of a matrix \(A\) and for approximating solutions to linear systems \(A\xvec=\bvec\text{.}\) We began our study of Arnoldi iteration by seeking the Hessenberg form of \(A\text{.}\) That is, we wanted to find an orthogonal matrix \(Q\) and a Hessenberg matrix \(H\) such that \(A=QHQ^T\text{.}\) Here we specialize to the case when \(A\) is symmetric.

🔗

Subsection 3.4.1 Lanczos iteration

When \(A\) is symmetric, we see that \(H=Q^TAQ\) is also symmetric. This implies that it is tridiagonal so we denote it by \(T\) rather than \(H\text{.}\)

\begin{equation*} T=\begin{bmatrix} \alpha_1 \amp \beta_1 \amp 0 \amp \ldots \amp 0 \\ \beta_1 \amp \alpha_2 \amp \beta_2 \amp \ldots \amp 0 \\ 0 \amp \beta_2 \amp \alpha_3 \amp \ddots \amp 0 \\ \vdots \amp \vdots \amp \ddots \amp \ddots \amp \beta_{n-1} \\ 0 \amp 0 \amp \ldots \amp \beta_{n-1} \amp \alpha_{n} \\ \end{bmatrix}\text{.} \end{equation*}

In this case, Arnoldi iteration simplifies because many terms in the iteration are zero. This leads to a slight modification of the algorithm that we call Lanczos iteration.

🔗

We have the relations

\begin{align*} A\qvec_1 \amp = \alpha_1\qvec_1 + \beta_1\qvec_2\\ A\qvec_2 \amp = \beta_1\qvec_1 + \alpha_2\qvec_2 + \beta_2\qvec_3\\ \vdots \amp = \vdots\\ A\qvec_n \amp = \beta_{n-1}\qvec_{n-1} + \alpha_n\qvec_n + \beta_{n}\qvec_{n+1}\text{.} \end{align*}

Notice that we have

\begin{align*} \alpha_n \amp = \qvec_n\cdot(A\qvec_n)=r(A,\qvec_n)\\ \beta_n \amp = \qvec_n\cdot(A\qvec_{n+1}) = \qvec_{n+1}\cdot(A\qvec_n)\text{.} \end{align*}

🔗

As before, we typically think of \(A\) as an \(m\times m\) matrix where \(m\) is very large and that we perform \(n\) steps of Lanczos iteration to obtain \(Q_n\) and \(\widetilde{T}_n\) such that

\begin{equation*} AQ_{n} = Q_{n+1}\widetilde{T}_n\text{.} \end{equation*}

🔗

The following cell defines a function that implements Lanczos iteration and returns \(Q_{n+1}\) and \(\widetilde{T}_n\text{.}\) It also defines a \(203\times 203\) symmetric matrix \(A\text{.}\) Evaluate this cell, and we will explore the results in the following cell.

🔗

Remember that the eigenvalues of the square matrix \(T_n\) approximate the largest eigenvalues of \(A\text{.}\) The next cell defines a value for \(n\) and performs \(n\) steps of Lanczos iteration on the \(203\times203\) matrix \(A\) defined in the previous cell. It will then print the largest eigenvalues of \(A\) and \(T\) side by side for comparison. Explore what happens with \(n=10\text{,}\) \(n=20\text{,}\) \(n=40\text{,}\) and \(n=80\) and take note of what you observe.

🔗

As \(n\) grows, we eventually encounter a problem. Because \(A\) is symmetric, we know that the matrix \(T\) is tridiagonal, which dramatically reduces the amount of computation we need to perform. However, we do not explicitly enforce orthogonality between the last vector \(\qvec_k\) created and the earlier ones. Since we cannot perform exact arithmetic with a computer, round off error eventually means that \(\qvec_k\) is not perfectly orthogonal to the earlier vectors. Remembering that the power method underlies this technique, as we continue iterating, we reintroduce a nonzero contribution from the eigenvector associated to the largest eigenvalue. Consequently, we find the largest eigenvalue a second, and then a third, time. This is a spooky situation so we call these “ghost eigenvalues.” There is a rich theory that aims to exorcise them.

🔗

Subsection 3.4.2 Legendre polynomials

Let’s turn our attention to the inner product space of polynomials \(\pbb\) with the inner product

\begin{equation*} \inner{p}{q} = \int_{-1}^1 p(x)q(x)~dx\text{.} \end{equation*}

We also define the operator \(S:\pbb\to\pbb\) by \(S(p)(x) = xp(x)\text{,}\) that is, \(S\) multiplies a polynomial by \(x\text{.}\) This is a self-adjoint operator because

\begin{equation*} \inner{xp(x)}{q(x)} = \int_{-1}^1 xp(x)q(x)~dx = \int_{-1}^1 p(x)\left(xq(x)\right)~dx = \inner{p(x)}{xq(x)} \end{equation*}

so we can apply Lanczos iteration to produce an orthogonal basis of \(\pbb_n\) for some finite \(n\text{.}\)

🔗

We begin with \(p_0(x) = 1\text{,}\) which defines \(q_0 = \frac1{\len{p_0}}~p_0 = \frac1{\sqrt{2}}\text{.}\) Our iterative steps have the form

\begin{align*} xq_0(x) \amp = \alpha_1 q_0(x) + \beta_1 q_1(x)\\ xq_1(x) \amp = \beta_1 q_0(x) + \alpha_2 q_1(x) + \beta_2 q_2(x)\\ \vdots \amp = \vdots\\ xq_n(x) \amp = \beta_n q_{n-1}(x) + \alpha_{n+1} q_n(x) + \beta_{n+1} q_{n+1}(x)\text{.} \end{align*}

🔗

The following Sage cell implements this algorithm and displays the resulting polynomials with \(n=6\text{.}\) Sage does not do a particularly good job of simplifying and displaying the polynomials.

🔗

You may notice that each polynomial \(q_k(x)\) has only even powers of \(x\) if \(k\) is even and only odd powers if \(k\) is odd. For this reason,

\begin{equation*} \alpha_{k}=\inner{xq_{k-1}}{q_{k-1}} = 0 \end{equation*}

for all \(k\text{.}\) That is, the diagonal entries of \(T_n\) are zero. This can be used to simplify the computation further, but we will not explicitly make use of this fact since it does not hold in other, related situations.

🔗

The orthogonal polynomials we are creating are called Legendre polynomials. Technically speaking, they are scalar multiples of the Legendre polynomials \(P_k\text{.}\) Our polynomials \(q_k\) are normalized to have unit length in the inner product space, while Legendre polynomials are usually normalized so that \(P_k(1)=1\text{.}\)

🔗

Subsection 3.4.3 Gaussian quadrature

By a quadrature, we mean a recipe for approximating a definite integral, such as

\begin{equation*} \int_{-1}^1 f(x) ~dx \approx \sum_{j=1}^n c_jf(x_j) \end{equation*}

for a set of weights \(c_j\) and points \(x_j\) called nodes. A simple quadrature is given by right endpoints, which is typically how a discussion of integrals begins:

\begin{equation*} \int_{-1}^1 f(x) ~dx \approx \sum_{j=1}^n \frac1{2n}f\left(-1+2j/n\right)\text{.} \end{equation*}

It turns out that Legendre polynomials provide an excellent quadrature, as we will now explore.

🔗

In designing a quadrature, our goal is to create an accurate approximation for a large class of functions. If we have \(n\) nodes, \(\basis{x}{n}\text{,}\) a natural criterion is that our quadrature be exact on all polynomials having degree less than \(n\text{.}\) We will see how to satisfy this criterion with an appropriate choice of weights \(c_j\text{.}\)

🔗

With a given set of nodes \(\basis{x}{n}\text{,}\) we define the Lagrange interpolating polynomial

\begin{equation*} L_{n,j}(x) = \frac{(x-x_1)(x-x_2)\ldots\widehat{(x-x_j)}\ldots (x-x_n)} {(x_j-x_1)(x_j-x_2)\ldots\widehat{(x_j-x_j)}\ldots (x_j-x_n)} \end{equation*}

where the terms \(\widehat{(x-x_j)}\) are removed. Notice that \(\deg(L_{n,j}) = n-1\) and that

\begin{equation*} L_{n,j}(x_k) = \begin{cases} 1 \amp \text{ if } j=k \\ 0 \amp \text{ if } j\neq k \\ \end{cases}\text{.} \end{equation*}

This means that the polynomials \(L_{n,j}\) form a basis for \(\pbb_{n-1}\text{.}\) If we define

\begin{equation*} c_j = \int_{-1}^1L_{n,j}(x)~dx\text{,} \end{equation*}

then the quadrature is exact on each \(L_{n,j}\) and hence all of \(\pbb_{n-1}\) by linearity.

🔗

This says that we can choose any set of nodes and find a quadrature that is exact on \(\pbb_{n-1}\text{.}\) The Legendre polynomials give us a particularly good choice of nodes if we choose \(\basis{x}{n}\) to be the roots of the \(n^{th}\) degree Legendre polynomial \(q_n\text{.}\) (These are also the roots of the classically defined Legendre polynomial \(P_n\) since \(q_n\) and \(P_n\) differ by a scalar multiple.)

🔗

Let’s explore why this is a good choice. We choose the weights \(c_j\) so that the quadrature is exact on \(\pbb_{n-1}\) as just described. Suppose that \(p\) is a polynomial whose degree is less than \(2n\text{.}\) By the Division Algorithm, we have

\begin{equation*} p = sq_n + r \end{equation*}

where \(\deg(r)\lt n\) so that \(r\) is in \(\pbb_{n-1}\text{,}\) which means that the quadrature is exact on \(r\text{.}\) In addition \(\deg(s)\lt n\) so that \(s\) can be written as a linear combination of \(\basis{q}{n-1}\text{,}\) which means that \(s\) is orthogonal to \(q_n\text{.}\) We then have

\begin{align*} \int_{-1}^1 p(x)~dx \amp = \int_{-1}^1 s(x)q_n(x)~dx +\int_{-1}^1 r(x)~dx\\ \amp = \int_{-1}^1 r(x)~dx\\ \amp = \sum_{j=1}^n c_jr(x_j)\\ \amp = \sum_{j=1}^n c_j\left(s(x_j)q_n(x_j) + r(x_j)\right)\\ \amp = \sum_{j=1}^n c_j p(x_j)\text{.} \end{align*}

The second equality is due to the fact that \(s\) and \(q_n\) are orthogonal, and the fourth equality follows from the fact that \(q_n(x_j) = 0\) for all \(j\text{.}\)

🔗

This is remarkable because it says that the quadrature defined in this way is exact on polynomials whose degree is less than \(2n\text{.}\) In other words, the degree to which we obtain accuracy is double what we would naively expect. This technique is called Gaussian quadrature.

🔗

Finding roots of polynomials is numerically unstable so we need a way to find the nodes \(x_j\text{,}\) which are the roots of \(q_n\text{.}\) Lanczos iteration can help with that. We will write out the steps in the iteration evaluating each of the polynomials at \(x_j\) for one of the nodes.

\begin{align*} x_jq_0(x_j) \amp = \alpha_1 q_0(x_j) + \beta_1 q_1(x_j)\\ x_jq_1(x_j) \amp = \beta_1 q_0(x_j) + \alpha_2 q_1(x_j) + \beta_2 q_2(x_j)\\ \vdots \amp = \vdots\\ x_jq_{n-1}(x_j) \amp = \beta_{n-1} q_{n-2}(x_j) + \alpha_{n} q_{n-1}(x_j) + \beta_{n} q_{n}(x_j)\text{.} \end{align*}

However, since \(q_n(x_j)=0\text{,}\) the final expression reduces to

\begin{equation*} x_jq_{n-1}(x_j) = \beta_{n-1} q_{n-2}(x_j) + \alpha_{n} q_{n-1}(x_j)\text{.} \end{equation*}

This means that if we define the matrix and vector

\begin{equation*} T_n = \begin{bmatrix} \alpha_1 \amp \beta_1 \amp 0 \amp \ldots \amp 0 \\ \beta_1 \amp \alpha_2 \amp \beta_2 \amp \ldots \amp 0 \\ 0 \amp \beta_2 \amp \alpha_3 \amp \ddots \amp 0 \\ \vdots \amp \vdots \amp \ddots \amp \ddots \amp \beta_{n-1} \\ 0 \amp 0 \amp \ldots \amp \beta_{n-1} \amp \alpha_{n} \\ \end{bmatrix}, \hspace{12pt} \vvec_j = \cfourvec{q_0(x_j)}{q_1(x_j)}{\vdots}{q_{n-1}(x_j)}\text{,} \end{equation*}

then we have

\begin{equation*} T_n\vvec_j = x_j\vvec_j\text{.} \end{equation*}

In other words, the nodes \(x_j\) are the eigenvalues of the tridiagonal matrix \(T\) and can be found with, say, the shifted \(QR\) algorithm. In addition, the eigenvectors are given by evaluating the Legendre polynomials at the nodes.

🔗

Theorem 3.4.1.

If \(T_n\) is the tridiagonal matrix that results from \(n\) steps of Lanczos iteration, then the eigenvalues of \(T_n\) are \(x_j\text{,}\) the roots of the \(n^{th}\) Legendre polynomial \(q_n\text{.}\)

🔗

Moreover, if we define eigenvectors \(\vvec_j = \cfourvec{q_0(x_j)}{q_1(x_j)}{\vdots}{q_{n-1}(x_j)}\text{,}\) then the weights in the quadrature are given by

\begin{equation*} c_j=\frac{1}{\len{\vvec_j}^2}\text{.} \end{equation*}

🔗

We will not prove the statement about the weights in the quadrature, but this is again a remarkable turn of events since it implies that we can develop Gaussian quadrature using only numerically stable linear algebraic operations.

🔗

Evaluate the following cell to again define the function Q, T = lanczos(n). We will use this in the next few cells.

🔗

We now choose a value of \(n=6\) and find the nodes and weights for the quadrature. We also define a functionn quad(f) that computes the quadrature applied to the function \(f(x)\)

🔗

Due to the symmetry of the weights and nodes, the quadrature should be zero on all odd powers of \(x\text{,}\) which we expect since the integral of an odd power is zero by symmetry. The following cell considers even powers \(0,2,4,\ldots,20\) and compares the result of the quadrature to the exact value \(\int_{-1}^1x^{2j}~dx = 2/(2j+1)\text{.}\)

🔗

Since \(n=6\text{,}\) we expect the quadrature to be exact up to degree \(2n-1 = 11\text{,}\) and this demonstration confirms that expectation.

🔗

Finally, we compare the quadrature to the exact value for the function \(f(x)=e^{-x^2}\text{,}\) which cannot be evaluated by antidifferentiation. This result has a relative error of less than \(10^{-6}\text{.}\)

🔗

Prev Top Next