While we will see that the minimal polynomial of a vector
\(p_\vvec\) is useful, we can take this idea a bit further and find a polynomial
\(p\) that annihilates every vector in
\(V\text{.}\) This will be called the
minimal polynomial of the operator \(T\text{.}\) Rather than starting with a proof of the theorem, letβs illustrate the idea by continuing our earlier example from
ExampleΒ 1.4.20.
Example 1.4.23. The minimal polynomial of an operator.
Remember the setup from
ExampleΒ 1.4.20. We have the operator
\(T:\real^4\to\real^4\) defined by
\(T\xvec=A\xvec\) where
\(A=\begin{bmatrix}
-10 \amp 16 \amp 5 \amp -9 \\
-6 \amp 10 \amp 1 \amp -5 \\
-2 \amp 4 \amp 1 \amp -5 \\
2 \amp -4 \amp -3 \amp 3 \\
\end{bmatrix}
\text{.}\) With the vector
\(\vvec=\fourvec1111\text{,}\) we found that
\(p_\vvec(x) = x^2 +4x+4=(x+2)^2\text{.}\)
We know that \(p_\vvec(T)\) annihilates \(\vvec\text{,}\) meaning that \(p_\vvec(T)\vvec=0\text{.}\) But \(p_\vvec(T)\) also annihilates \(T\vvec\) since
\begin{equation*}
p_\vvec(T)T\vvec = Tp_\vvec(T)\vvec = 0\text{,}
\end{equation*}
which means that \(p_\vvec(T)\) annihilates any vector in the two-dimensional Krylov subspace \(\kcal_2(T,\vvec)\text{.}\) We would like to find a polynomial that annihilates every vector in \(V\text{.}\)
To do this, will consider the subspace
\(U=\range(p_\vvec(T))\text{.}\) By
PropositionΒ 1.4.15,
\(U\) is a
\(T\)-invariant subspace of
\(\real^4\text{.}\) In fact, we can find a basis for
\(U\text{.}\) We see that the matrix representing
\(p_\vvec(T)\) is
\begin{equation*}
p_\vvec(A) =
\begin{bmatrix}
-60 \amp 120 \amp 18 \amp -78 \\
-36 \amp 72 \amp 0 \amp -36 \\
-24 \amp 48 \amp 18 \amp -42 \\
24 \amp -48 \amp -18 \amp 42
\end{bmatrix}
\sim
\begin{bmatrix}
1 \amp -2 \amp 0 \amp 1 \\
0 \amp 0 \amp 1 \amp -1 \\
0 \amp 0 \amp 0 \amp 0 \\
0 \amp 0 \amp 0 \amp 0
\end{bmatrix}\text{.}
\end{equation*}
This shows that a basis for \(\range(p_\vvec(T)) =
\col(p_\vvec(A))\) is
\begin{equation*}
\uvec_1 = \fourvec{-60}{-36}{-24}{24},\hspace{12pt}
\uvec_2 = \fourvec{18}{0}{18}{-18}\text{.}
\end{equation*}
Now that we know that \(U\) is a \(T\)-invariant subspace and we find that
\begin{align*}
T\uvec_1 \amp = 4\uvec_1 - 4 \uvec_2\\
T\uvec_1 \amp = 4 \uvec_2\text{.}
\end{align*}
By \(T|_U\text{,}\) we mean the operator \(T\) restricted to \(U\text{;}\) that is \(T|_U:U\to U\) by \(T|_U(\uvec) =
T\uvec\text{.}\) If we call the basis for \(U\) \(\bcal=\{\uvec_1,\uvec_2\}\text{,}\) we have
\begin{equation*}
\coords{T|_U}{\bcal} =
\begin{bmatrix}
4 \amp -4 \\
0 \amp 3 \\
\end{bmatrix}\text{.}
\end{equation*}
Now we will choose a vector \(\uvec=\twovec11\) (think of this as being randomly chosen) and find the annihilating polynomial \(p_\uvec\) for \(T|_U\text{.}\) We have
\begin{equation*}
\begin{bmatrix}
\uvec \amp T|_U\uvec \amp T|_U^2\uvec
\end{bmatrix}
\sim
\begin{bmatrix}
1 \amp 0 \amp -16 \\
0 \amp 1 \amp 8 \\
\end{bmatrix}\text{,}
\end{equation*}
which means that
\begin{equation*}
(T|_U^2-8T|_U + 16I)\uvec = \zerovec\text{.}
\end{equation*}
In other words, \(p_\uvec(x) = x^2 - 8x+16=(x-4)^2\) is the annihilating polynomial of \(\uvec\) for the operator \(T|_U\text{.}\)
In fact, this polynomial annihilates every vector in \(U\) because we find that
\begin{equation*}
T|_U^2 - 8T|_U + 16I = 0\text{.}
\end{equation*}
Notice what happens if we multiply
\begin{equation*}
p(x) = p_\uvec(x)p_\vvec(x) = (x-4)^2(x+2)^2\text{.}
\end{equation*}
Suppose that \(\wvec\) is any vector in \(V\text{.}\) Then
\begin{equation*}
p(T)\wvec = p_\uvec(T)p_\vvec(T)\wvec\text{.}
\end{equation*}
But notice that \(p_\vvec(T)\wvec\) is in \(U=\range(p_\vvec(T))\) and that \(p_\uvec(T)\) annihilates every vector in \(U\text{.}\) This means that \(p(T)\wvec = 0\) so that \(p(T)\) annihilates every vector in \(V\text{.}\)
In fact, this polynomial is what we will call the minimal polynomial of
\(T\text{.}\) This example illustrates the proof of the theorem that we are about to give.
Proof.
Our proof proceeds by induction on the dimension of
\(V\text{.}\) To begin, suppose that
\(\dim(V) = 1\text{,}\) which means that
\(V=\laspan{\vvec}\) for some vector
\(\vvec\text{.}\) Then
\(T\vvec = \lambda\vvec\) for some
\(\lambda\text{,}\) which is possibly
\(0\text{.}\) Then
\((T-\lambda I)\vvec = \zerovec\text{,}\) which means that
\(T-\lambda I = 0\) since
\(\vvec\) spans
\(V\text{.}\) Therefore, if
\(p(x) = x-\lambda\text{,}\) we have
\(p(T)=0\text{.}\)
Now suppose that
\(\dim(V)=n\) and that the theorem has been verified for all operators on vector spaces of dimension less than
\(n\text{.}\) Following our work in
ExampleΒ 1.4.23, we choose a nonzero vector
\(\vvec\) and find its minimal polynomial
\(p_\vvec\text{.}\) This polynomial will have degree
\(m\geq
1\text{.}\)
Since \(\nul(p_\vvec(T))\) is \(T\)-invariant and \(\vvec\) is in \(\nul(p_\vvec(T))\text{,}\) it follows that
\begin{equation*}
\vvec, T\vvec, T^2\vvec, \ldots, T^{m-1}\vvec
\end{equation*}
are all in \(\nul(p_\vvec(T))\text{.}\) Since these vectors are linearly independent, we have
\begin{equation*}
\dim\nul(p_\vvec(T)) \geq m \geq 1\text{.}
\end{equation*}
\begin{equation*}
\dim(\range(p(T))) = \dim(V) - \dim(\nul(p(T))) \leq n - m
\lt n\text{.}
\end{equation*}
This says that the induction hypothesis applies to the operator \(T|_U\) so that there is a minimal polynomial, which we will denote by \(p_U\) so that \(p_U(T|_U) =
0\text{.}\) Moreover, by the induction hypothesis, this polynomial is monic and \(\deg(p_U) \leq \dim U \leq n - m\text{.}\)
We call \(p = p_Up_\vvec\) and note that
\begin{equation*}
\deg(p)=\deg(p_U)+\deg(p_\vvec) \leq n-m + m = n=\dim
V\text{.}
\end{equation*}
Moreover, if \(\wvec\) is any vector in \(V\text{,}\) it follows that \(p_\vvec(T)\wvec\) is a vector in \(U=\range(p_\vvec)\) so that
\begin{equation*}
p(T)\wvec = p_U(T)p_\vvec(T)\wvec = 0\text{.}
\end{equation*}
That is, \(p(T)\) annihilates every vector in \(V\) so that \(p(T)=0\text{.}\)
This shows that there is a monic polynomial
\(p\) such that
\(p(T)=0\) on
\(V\) and
\(\deg(p) \leq \dim V\text{.}\) Therefore, there is some possibly different polynomial having the smallest degree among all such polynomials, and this is the minimal polynomial of the operator
\(T\) on
\(V\text{.}\)
To see that this polynomial is unique, suppose there are two monic polynomials \(s_1\) and \(s_2\) having smallest degree and \(s_1(T)=0\) and \(s_2(T)=0\text{.}\) If we consider \(s_1-s_2\text{,}\) we see that \(\deg(s_1-s_2)\lt
\deg(s_1)=\deg(s_2)\) since the highest degree terms of \(s_1\) and \(s_2\) have coefficients \(1\) and therefore cancel. Also,
\begin{equation*}
(s_1-s_2)(T) = s_1(T) - s_2(T) = 0\text{.}
\end{equation*}
However, this is impossible since \(s_1\) and \(s_2\) had the smallest possible degree among all polynomials that vanish when evaluated on \(T\text{.}\) This means that \(s_1=s_2\text{,}\) which guarantees the uniqueness of the minimal polynomial.
Example 1.4.25.
Returning to
our earlier example where
\(T:\real^4\to\real^4\) by
\(T\xvec = A\xvec\) with
\(A=\begin{bmatrix}
-10 \amp 16 \amp 5 \amp -9 \\
-6 \amp 10 \amp 1 \amp -5 \\
-2 \amp 4 \amp 1 \amp -5 \\
2 \amp -4 \amp -3 \amp 3 \\
\end{bmatrix}
\text{.}\) If we choose
\(\vvec=\fourvec1010\text{.}\) We have
\begin{equation*}
\begin{bmatrix}
\vvec \amp T\vvec \amp T^2\vvec \amp T^3\vvec \amp T^4\vvec
\end{bmatrix}
\sim
\begin{bmatrix}
1 \amp 0 \amp 0 \amp 0 \amp -64 \\
0 \amp 1 \amp 0 \amp 0 \amp -32 \\
0 \amp 0 \amp 1 \amp 0 \amp 12 \\
0 \amp 0 \amp 0 \amp 1 \amp 4
\end{bmatrix}\text{.}
\end{equation*}
This says that \(p_\vvec(x) = x^4-4x^3-12x^2+32x+64\text{.}\) One can now check that \(p_\vvec(T) = 0\) so that the minimal polynomial of \(T\) is in fact
\begin{equation*}
p(x) = x^4-4x^3-12x^2+32x+64=(x+2)^2(x-4)^2\text{.}
\end{equation*}
This is, of course, the same polynomial we found earlier. For most vectors \(\vvec\text{,}\) we will find that the minimal polynomial of the operator \(T\) is \(p_\vvec\text{.}\)
Example 1.4.26.
Consider the
\(2\times2\) matrix
\(A=\begin{bmatrix}
2 \amp 0 \\
0 \amp 2
\end{bmatrix}\) and the linear operator
\(T\) that it defines. Notice that
\(A-2I = 0\) so if
\(p(x)=x-2\text{,}\) then
\(p(T)=0\text{.}\) The minimal polynomial of
\(T\) is therefore
\(p(x)=x-2\text{.}\)
More generally, suppose that the minimal polynomial of an operator
\(T\) has degree
\(1\text{.}\) Since the minimal polynomial is monic, this means that
\(p(x)=x-\lambda\text{.}\) Because
\(p(T)=T-\lambda I = 0\text{,}\) we see that
\(T=\lambda
I\text{,}\) a multiple of the identity.
Example 1.4.27.
By contrast, consider the
\(2\times2\) matrix
\(B=\begin{bmatrix}
2 & 1 \\
0 & 2
\end{bmatrix}\) and the linear operator
\(S\) that it defines with respect to some basis. The degree of the minimal polynomial must be at least 2 since
\(B\) is not a multiple of the identity matrix. Notice, however, that
\(B-2I = \begin{bmatrix}
0 & 1 \\
0 & 0 \\
\end{bmatrix}\) and
\((B-2I)^2 = 0\text{.}\) This says that
\((S-2I)^2 =
0\) and so the minimal polynomial of
\(S\) is
\(q(x)=(x-2)^2\text{.}\)
Both of the matrices in the two previous examples are upper triangular. Remembering that the eigenvalues of an upper triangular matrix are the entries on the diagonal, we see that the roots of the minimal polynomials in both cases are the eigenvalues of the operator. This gives some indication of the importance of the minimal polynomial, as we will see now.
Proof.
Suppose that
\(p\) is the minimal polynomial of
\(T\text{.}\) We need to explain two things: that any eigenvalue of
\(T\) is a root of
\(p\) and that any root of
\(p\) is an eigenvalue of
\(T\text{.}\)
Suppose that \(\lambda\) is an eigenvalue of \(T\text{.}\) This means that there is a nonzero vector \(\vvec\) such that \(T\vvec = \lambda\vvec\) and therefore \(T^j\vvec =
\lambda^j\vvec\) for every \(j\text{.}\) This means that
\begin{equation*}
0 = p(T)\vvec = p(\lambda)\vvec\text{,}
\end{equation*}
which implies that \(p(\lambda) = 0\text{.}\) Therefore, \(\lambda\) is a root of \(p\text{,}\) the minimal polynomial of \(T\text{.}\)
Conversely, suppose that
\(\lambda\) is a root of
\(p\text{.}\) By
PropositionΒ 1.4.5, this means that
\begin{equation*}
p(x) = (x-\lambda)q(x)\text{.}
\end{equation*}
This says that
\begin{equation*}
0 = p(T) = (T-\lambda I)q(T)
\end{equation*}
However, \(q(T)\neq 0\) since \(\deg(q) \lt \deg(p)\text{,}\) which implies there is some vector \(\vvec\) for which \(q(T)\vvec\neq
0\text{.}\) Therefore,
\begin{equation*}
0 = p(T)\vvec = (T-\lambda I)q(T)\vvec\text{,}
\end{equation*}
which shows that \(q(T)\vvec\) is an eigenvector \(T\) with associated eigenvalue \(\lambda\text{.}\)
This is the most significant fact about the minimal polynomial: that its roots are the eigenvalues of the operator. Weβll put this to use in the next section. Before that, however, here are two other useful facts.
Proof.
If \(p\) is the minimal polynomial of \(T\text{,}\) then we can apply the Division Algorithm to write \(s = pq + r\) where \(\deg(r) \lt \deg(p)\text{.}\) Furthermore,
\begin{equation*}
0 = s(T) = p(T)q(T) + r(T) = r(T)\text{,}
\end{equation*}
which implies that \(r(T)=0\text{.}\) Since \(p\) has the smallest degree among all polynomials that vanish when evaluated on \(T\) and \(r\) has a smaller degree than \(p\text{,}\) we know that \(r=0\text{.}\) Therefore, \(s=pq\text{.}\)