Subsection 3.1.1 The power method
For convenience, we will suppose that \(A\) has a complete set of eigenvectors \(\vvec_i\) with associated eigenvalues \(\lambda_i\) and that the eigenvalues are ordered so that
\begin{equation*}
\len{\lambda_1} \geq \len{\lambda_2} \geq \ldots \geq
\len{\lambda_m}\text{.}
\end{equation*}
Our initial version of the power method will help us find \(\lambda_1\text{,}\) the eigenvalue having the largest magnitude, which is often called the dominant eigenvalue.
Suppose we start with a randomly chosen vector \(\vvec\) so that
\begin{equation*}
\vvec = c_1\vvec_1+c_2\vvec_2+\ldots+c_m\vvec_m\text{.}
\end{equation*}
Since \(\vvec\) is randomly chosen, it is safe to assume that \(c_1\neq 0\text{.}\) If not, we had really bad luck in choosing \(\vvec\text{.}\) However, computers cannot perform exact arithmetic so even if \(c_1=0\text{,}\) some error will creep into the subsequent calculations making \(c_1\neq 0\text{.}\)
We can multiply \(\vvec\) by \(A\) so that
\begin{equation*}
A\vvec =
c_1\lambda_1\vvec_1+c_2\lambda_2\vvec_2+\ldots+c_m\lambda_m\vvec_m\text{.}
\end{equation*}
Since \(\lambda_1\) has the largest magnitude, we see that the contribution to \(A\vvec\) from \(\vvec_1\) is the greatest. While this is the essential idea in the power method, as we will see, this also explains why finding \(\lambda_1\) is useful: the eigenvalues with the greatest magnitude contribute the most to \(A\vvec\text{.}\)
If we repeatedly multiply \(\vvec\) by \(A\text{,}\) we see that
\begin{align*}
A^k\vvec \amp =
c_1\lambda_1^k\vvec_1+c_2\lambda_2^k\vvec_2+\ldots+
c_m\lambda_m^k\vvec_m\\
\amp =
\lambda_1^k\left(
c_1\vvec_1+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\vvec_2
+ \ldots +
c_m\left(\frac{\lambda_m}{\lambda_1}\right)^k\vvec_m\right)\text{.}
\end{align*}
We expect that \(\len{\frac{\lambda_k}{\lambda_1}} \lt 1\) for \(k\geq 2\) and, as a result, that \(\left(\len{\frac{\lambda_k}{\lambda_1}}\right)^k \to 0\) as \(k\) grows. Consequently, when \(k\) is large,
\begin{equation*}
A^k\vvec \approx c_1\lambda_1^k\vvec_1\text{,}
\end{equation*}
which shows that the direction of \(A^k\vvec\) will approximate more and more closely the direction of \(\vvec_1\text{.}\)
Of course, if \(\len{\lambda_1} \gt 1\) or \(\len{\lambda_1} \lt 1\text{,}\) then the length of \(A^k\vvec\) will grow very large or shrink away to zero. To mitigate this effect, we will normalize the product \(A\vvec\) after every step. Remembering that \(a \leftarrow b\) means to replace the value of \(a\) by the current value of \(b\text{,}\) the first draft of our algorithm looks like this,
-
Choose
\(\vvec\) randomly
-
Repeat a sufficient number of times:
-
\(\displaystyle \vvec \leftarrow A\vvec\)
-
\(\displaystyle \vvec \leftarrow
\frac{1}{\len{\vvec}}\vvec\)
Assuming that \(\len{\lambda_1} \gt \len{\lambda_2}\text{,}\) the resulting vector \(\vvec\) is eventually a good approximation of the eigenvector \(\vvec_1\text{.}\)
How can we find an approximation for the eigenvalue \(\lambda_1\text{?}\) If \(\vvec\approx \vvec_1\text{,}\) then \(A\vvec\approx \lambda_1\vvec_1\text{,}\) and
\begin{equation*}
\vvec\cdot A\vvec \approx \lambda_1 \vvec_1\cdot\vvec_1
\end{equation*}
so that
\begin{equation*}
\lambda_1 \approx \frac{\vvec\cdot A\vvec}{\vvec\cdot\vvec} =
r(A,\vvec)\text{.}
\end{equation*}
This ratio of the inner products is known as the Rayleigh quotient.
Definition 3.1.1.
Given a square matrix \(A\) and a vector \(\vvec\text{,}\) the Rayleigh quotient is defined to be
\begin{equation*}
r(A,\vvec) = \frac{\vvec\cdot A\vvec}{\vvec\cdot\vvec}\text{.}
\end{equation*}
Proposition 3.1.2.
If \(\vvec\) is an eigenvector of \(A\) with associated eigenvalue \(\lambda\text{,}\) then
\begin{equation*}
r(A,\vvec) = \lambda\text{.}
\end{equation*}
Proof.
This is because
\begin{equation*}
r(A,\vvec) = \frac{\vvec\cdot A\vvec}{\vvec\cdot\vvec}
= \frac{\lambda \vvec\cdot\vvec}{\vvec\cdot\vvec} = \lambda\text{.}
\end{equation*}
This leads to an updated version of the power method:
-
Choose
\(\vvec\) randomly
-
Repeat a sufficient number of times:
-
\(\displaystyle \mu \leftarrow r(A,\vvec)\)
-
\(\displaystyle \vvec \leftarrow A\vvec\)
-
\(\displaystyle \vvec \leftarrow
\frac{1}{\len{\vvec}}\vvec\)
Now each iteration of the algorithm produces an approximation of \(\lambda_1\) and \(\vvec_1\text{.}\)
The following Sage cell implements the power method and applying it to the matrix
\(A=\begin{bmatrix}1\amp 2\\2\amp
1\end{bmatrix}\) and initial vector
\(\vvec = \twovec10\) 20 times.
For this matrix, we have seen that the dominant eigenvalue
\(\lambda_1 = 3\) and a basis vector for
\(E_3\) is
\(\vvec_1=\twovec 11\text{.}\) Notice how our demonstration shows the results converging to these quantities.
We will not be too careful about analyzing how rapidly the eigenvalue estimates converge to the exact eigenvalue, but we can make some straightforward observations. Remember that
\begin{equation*}
A^k\vvec =
\lambda_1^k\left(
c_1\vvec_1+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\vvec_2
+ \ldots +
c_m\left(\frac{\lambda_m}{\lambda_1}\right)^k\vvec_m\right)\text{,}
\end{equation*}
which shows that the direction of \(A^k\vvec\) approaches the direction of the eigenvector \(\vvec_1\) and that the second largest term is determined by the ratio \(\len{\lambda_2/\lambda_1}\text{.}\) If \(\len{\lambda_2}\) is much less than \(\len{\lambda_1}\text{,}\) then \((\lambda_2/\lambda_1)^k\) will approach 0 quickly, which means we can expect our eigenvalue estimates to converge quickly. If \(\len{\lambda_2}\) is close to \(\len{\lambda_1}\text{,}\) then \((\lambda_2/\lambda_1)^k\) will approach 0 slowly so that we expect our eigenvalue estimates to converge slowly as well.
Subsection 3.1.2 The inverse power method
The power method will help us find the dominant eigenvalue
\(\lambda_1\) of a matrix
\(A\text{.}\) Can we find any more?
Remember that, assuming
\(\vvec\) is an eigenvector of an invertible matrix
\(A\) with associated nonzero eigenvalue
\(\lambda\text{,}\) then
\(\vvec\) is also an eigenvector of
\(A^{-1}\) with associated eigenvalue
\(\frac1\lambda\text{.}\) Therefore, applying the power method to
\(A^{-1}\) will help us find the eigenvalue of
\(A\) having the
smallest magnitude.
Here is our algorithm, called the inverse power method, looks in this case:
-
Choose
\(\vvec\) randomly
-
Repeat a sufficient number of times:
-
\(\displaystyle \mu \leftarrow r(A,\vvec)\)
-
\(\displaystyle \vvec \leftarrow A^{-1}\vvec\)
-
\(\displaystyle \vvec \leftarrow
\frac{1}{\len{\vvec}}\vvec\)
As you can see, the only change is that we multiply by \(A^{-1}\) in the iteration. The following Sage cell demonstrates.
In this case, we see the results converging to the eigenvalue
\(\lambda_2=-1\) and associated eigenvector
\(\twovec{1}{-1}\text{.}\)
Remember that there is a potentially prohibitive cost to finding the inverse
\(A^{-1}\text{.}\) Therefore, it is preferred to replace the step
\(\wvec\leftarrow A^{-1}\vvec\) with “Solve
\(A\wvec=\vvec\) for
\(\wvec\)”. In this formulation, an
\(LU\) factorization of
\(A\) could be reused when repeatedly solving the equation.
The eigenvalue with the smallest absolute value is
\(\lambda_m\) so the convergence of the eigenvalue estimates is determined by the ratio
\(\lambda_m/\lambda_{m-1}\text{.}\) When the eigenvalues are close in magnitude, then we expect slow convergence.
Subsection 3.1.3 The shifted inverse power method
With another modification, we can use these ideas to find any eigenvalue of \(A\text{.}\) Remember that if \(\vvec\) is an eigenvector of \(A\) having eigenvalue \(\lambda\text{,}\) then \(\vvec\) is also an eigenvector of \(A-\sigma I\) with associated eigenvalue \(\lambda-\sigma\text{.}\) Therefore, applying the inverse power method to \(A-\sigma I\) will find the eigenvalue of \(A\) closest to \(\sigma\text{.}\) Once again, the algorithm only modifies a single line.
-
Choose
\(\vvec\) randomly
-
Repeat a sufficient number of times:
-
\(\displaystyle \mu \leftarrow r(A,\vvec)\)
-
\(\displaystyle \vvec \leftarrow (A-\sigma I)^{-1}\vvec\)
-
\(\displaystyle \vvec \leftarrow
\frac{1}{\len{\vvec}}\vvec\)
This use of the shifted inverse power method will find the eigenvalue of
\(A\) closest to
\(3\text{,}\) which is
5.97351259038981
. With some more experimenting, you should be able to find all five eigenvalues of
\(A\text{.}\)
The rate of convergence is controlled by the ratio
\((\lambda_k-\sigma)/(\lambda_{j}-\sigma)\) where
\(\lambda_k\) is the eigenvalue closest to
\(\sigma\) and
\(\lambda_j\) is the next closest eigenvalue. Therefore, the closer we choose
\(\sigma\) to
\(\lambda_k\text{,}\) we faster we expect the eigenvalue estimates to converge.
Notice that, after 20 steps, our estimate for the eigenvalue, found in the previous Sage cell, is only accurate to within 0.03%. Thus we see a few drawbacks to the power method:
-
It oftentimes converges slowly.
-
It finds only one eigenvalue at a time.
-
We need to find
\(A^{-1}\) or solve equations of the form
\(A\wvec = \vvec\) in each step.
-
Automating the process of finding all the eigenvalues could be difficult since we need to determine appropriate values of
\(\sigma\) to use in the shifted version.
While we will explore ways to address these shortcomings, the power method is important as a demonstration of one idea that we will continue to use: repeatedly multiplying
\(\vvec\) by
\(A\) gives us information about the eigenvalues.