As mentioned in the introduction to this section, our interest in polynomials stems from the insights we gain when we evaluate a polynomial on an operator \(T\text{.}\) For instance, if \(p(x)=x^2+4x-5\text{,}\) then \(p(T) = T^2 + 4T - 5I\) a new operator. Notice that we consider the constant term to be multiplied by the identify transformation \(I\text{.}\)
Remember that the order in which we multiply polynomials is not important. In particular, \(p(T)q(T)=q(T)p(T)\text{.}\) This leads to the following important proposition.
We now come to a crucial result for our upcoming explorations. First, we need to make the following definition clear.
Proof.
Our proof proceeds by induction on the dimension of \(V\text{.}\) To begin, suppose that \(\dim(V) = 1\text{,}\) which means that \(V=\laspan{\vvec}\) for some vector \(\vvec\text{.}\) Then \(T\vvec = \lambda\vvec\) for some \(\lambda\text{,}\) which is possibly \(0\text{.}\) Then \((T-\lambda I)\vvec = \zerovec\text{,}\) which means that \(T-\lambda I = 0\) since \(\vvec\) spans \(V\text{.}\) Therefore, if \(p(x) = x-\lambda\text{,}\) we have \(p(T)=0\text{.}\)
We now imagine that \(\dim(V)=n\) and that the theorem has been verified for all operators on vector spaces of dimension less than \(n\text{.}\) We choose a vector \(\vvec\) and consider the powers \(T^k\vvec\text{;}\) that is, consider the vectors
\begin{equation*}
\vvec,T\vvec,T^2\vvec,\ldots,T^n\vvec\text{.}
\end{equation*}
Since there are \(n+1\) vector in this set and \(\dim(V)=n\text{,}\) we know this is a linearly dependent set.
Choose \(m\) to be the smallest integer such that \(T^m\vvec\) is a linear combination of \(\vvec,T\vvec,\ldots,T^{m-1}\vvec\text{.}\) This means two things. First, the vectors \(\vvec,T\vvec,\ldots,T^{m-1}\vvec\) are linearly independent. Second, there are constants
\begin{equation*}
a_0\vvec + a_1T\vvec + \ldots + a_{m-1}T^{m-1}\vvec +
T^m\vvec = \zerovec\text{.}
\end{equation*}
If we define the degree \(m\) monic polynomial
\begin{equation*}
p(x)=a_0 + a_1x + \ldots + a_{m-1}x^{m-1}+x^m\text{,}
\end{equation*}
then \(p(T)\vvec = \zerovec\text{.}\) That is, \(\vvec\) is in \(\nul(p(T))\text{.}\)
Since \(\nul(p(T))\) is invariant under \(T\) and \(\vvec\) is in \(\nul(p(T))\text{,}\) we know that
\begin{equation*}
\vvec,T\vvec,\ldots,T^{m-1}\vvec
\end{equation*}
are all in \(\nul(p(T))\text{.}\) These vectors are linearly independent so we know that \(\dim(\nul(p(T)))\geq m\text{.}\) Therefore,
\begin{equation*}
\dim(\range(p(T))) = \dim(V) - \dim(\nul(p(T))) \leq n - m\text{.}
\end{equation*}
For convenience, we will denote the vector space \(W=\range(p(T))\text{.}\) Since \(W\) is invariant under \(T\text{,}\) \(T\) is an operator on \(W\text{,}\) whose dimension is less than \(\dim(V)\text{.}\) By the induction hypothesis, we know that there is a unique monic polynomial \(q(x)\) such that \(q(T|_W)=0\text{.}\) Again by the induction hypothesis, it also follows that \(\deg(q) \leq \dim(W) \leq n - m\text{.}\)
Now consider the polynomial \(pq\) whose degree is
\begin{equation*}
\deg(qp) = \deg(q)+\deg(p) \leq n - m + m \leq n = \dim(V)\text{.}
\end{equation*}
Moreover, both \(p\) and \(q\) are monic so \(pq\) is also monic. Finally, for any vector \(\vvec\) in \(V\text{,}\) we have
\begin{equation*}
(qp)(T)\vvec = q(T)p(T)\vvec = q(T)(p(T)\vvec) = \zerovec
\end{equation*}
where the last equality holds because \(p(T)\vvec\) is in \(W=\range(p(T))\) and \(q(T)\uvec=\zerovec\) for any vector \(\uvec\) in \(W\text{.}\) Since \((qp)(T)\vvec=\zerovec\) for every vector \(\vvec\text{,}\) this means that \((qp)(T)=0\text{.}\)
This shows that there is a monic polynomial \(s\) such that \(s(T)=0\) on \(V\text{.}\) Therefore, there is some possibly different polynomial having the smallest degree among all such polynomials, and this is the minimal polynomial of the operator \(T\) on \(V\text{.}\)
To see that this polynomial is unique, suppose there are two monic polynomials \(s_1\) and \(s_2\) having smallest degree and \(s_1(T)=0\) and \(s_2(T)=0\text{.}\) If we consider \(s_1-s_2\text{,}\) we see that \(\deg(s_1-s_2)\lt
\deg(s_1)=\deg(s_2)\) since the highest degree terms of \(s_1\) and \(s_2\) have coefficients \(1\) and therefore cancel. Also,
\begin{equation*}
(s_1-s_2)(T) = s_1(T) - s_2(T) = 0\text{.}
\end{equation*}
However, this is impossible since \(s_1\) and \(s_2\) had the smallest possible degree among all polynomials that vanish when evaluated on \(T\text{.}\) This means that \(s_1=s_2\text{,}\) which guarantees the uniqueness of the minimal polynomial.
Example 1.4.11.
Consider the \(2\times2\) matrix \(A=\begin{bmatrix}
2 \amp 0 \\
0 \amp 2
\end{bmatrix}\) and the linear operator \(T\) that it defines. Notice that \(A-2I = 0\) so if \(p(x)=x-2\text{,}\) then \(p(T)=0\text{.}\) The minimal polynomial of \(T\) is therefore \(p(x)=x-2\text{.}\)
More generally, suppose that the minimal polynomial of an operator \(T\) has degree \(1\text{.}\) Since the minimal polynomial is monic, this means that \(p(x)=x-\lambda\text{.}\) Because \(p(T)=T-\lambda I = 0\text{,}\) we see that \(T=\lambda
I\text{,}\) a multiple of the identity.
Both of the matrices in the two previous examples are upper triangular. Remembering that the eigenvalues of an upper triangular matrix are the entries on the diagonal, we see that the roots of the minimal polynomials in both cases are the eigenvalues of the operator. This gives some indication of the importance of the minimal polynomial, as we will see now.
The fact that the minimal polynomial has the smallest degree among all polynomials for which \(p(T)=0\) has some important consequences.
Proof.
Suppose that \(p\) is the minimal polynomial of \(T\text{.}\) We need to explain two things: that any eigenvalue of \(T\) is a root of \(p\) and that any root of \(p\) is an eigenvalue of \(T\text{.}\)
Suppose that \(\lambda\) is an eigenvalue of \(T\text{.}\) This means that there is a nonzero vector \(\vvec\) such that \(T\vvec = \lambda\vvec\) and therefore \(T^j\vvec =
\lambda^j\vvec\) for every \(j\text{.}\) This means that
\begin{equation*}
0 = p(T)\vvec = p(\lambda)\vvec\text{,}
\end{equation*}
which implies that \(p(\lambda) = 0\text{.}\) Therefore, \(\lambda\) is a root of \(p\text{,}\) the minimal polynomial of \(T\text{.}\)
Conversely, suppose that
\(\lambda\) is a root of
\(p\text{.}\) By
Proposition 1.4.4, this means that
\begin{equation*}
p(x) = (x-\lambda)q(x)\text{.}
\end{equation*}
This says that
\begin{equation*}
0 = p(T) = (T-\lambda I)q(T)
\end{equation*}
However, \(q(T)\neq 0\) since \(\deg(q) \lt \deg(p)\text{,}\) there is some vector \(\vvec\) for which \(q(T)\vvec\neq
0\text{.}\) Therefore,
\begin{equation*}
0 = p(T)\vvec = (T-\lambda I)q(T)\vvec\text{,}
\end{equation*}
which shows that \(q(T)\vvec\) is an eigenvector \(T\) with associated eigenvalue \(\lambda\text{.}\)
This is the most significant fact about the minimal polynomial: that its roots are the eigenvalues of the operator. We’ll put this to use in the next section. Before that, however, here are two other useful facts.