Skip to main content

Section 1.4 The minimal polynomial

This section will describe some important properties of polynomials and demonstrate how they might help us understand linear transformations or, more accurately, operators, which are linear transformations \(T:V\to V\) from a vector space to itself.
We will begin by studying familiar polynomials, such as you saw in high school algebra. For example, \(p(x)=2x^2 - 3x + 7\) is a quadratic polynomial. Eventually, we will consider polynomials evaluated on an operator \(T\text{.}\) For instance, evaluating \(p(x)\) on the operator \(T\) produces a new operator \(p(T) = 2T^2 - 3T + 7I\text{.}\) As a preview for how this may be useful, the polynomial \(p(x)=x-2\) leads to the operator \(p(T) = T-2I\) whose null space is the set of eigenvectors associated to the eigenvalue \(\lambda = 2\text{.}\)

Subsection 1.4.1 Properties of Polynomials

We will be interested in polynomials that depend on a single variable, which will often be \(x\text{.}\) A general polynomial has the form
\begin{equation*} p(x)=a_nx^n + a_{n-1}x^{n-1}+\ldots + a_1x+a_0 \end{equation*}
where \(a_j\) are the coefficients of \(p(x)\) and are elements in either \(\field=\C\) or \(\real\text{.}\)

Definition 1.4.1.

The degree of a polynomial is the highest power of \(x\) whose coefficient is nonzero. For example, the degree of
\begin{equation*} p(x)=-2x^7 + 4x^6 -x^5 + 4x^3-3x + 1 \end{equation*}
is \(\deg(p) = 7\text{.}\)
We have already seen that the set of polynomials form a vector space, which means we can multiply polynomials by scalars and we can add them. An additional operation allows us to multiply two polynomials together.

Example 1.4.2.

Suppose that \(p(x)=3x^2-4x+2\) and \(q(x)=8x-5\text{.}\) We multiply polynomials term by term so that
\begin{equation*} (pq)(x) = 24x^3-27x^2+36x-10 \end{equation*}
More generally, denoting the coefficients of \(p\) by \(a_j\) and the coefficients of \(q\) by \(b_j\text{,}\) the coefficients of \(pq\) are
\begin{equation*} c_k=\sum_{j=0}^k a_jb_{k-j}=\sum_{j=0}^k b_ja_{k-j}\text{.} \end{equation*}

Properties of polynomial multiplication.

Two important properties follow from the general expression for the coefficients of the product given in ExampleΒ 1.4.2.
  • The order in which we multiply polynomials is irrelevant; that is,
    \begin{equation*} pq = qp\text{.} \end{equation*}
  • The degree of the product is the sum of the degrees; that is,
    \begin{equation*} \deg(pq)=\deg(p)+\deg(q)\text{.} \end{equation*}
Integers satisfy a property often called the Division Algorithm. Suppose that \(a\) and \(b\) are positive integers. Then there are integers \(q\) and \(r\) such that
\begin{equation*} a = bq + r \end{equation*}
where \(0\leq r \leq b-1\text{.}\) We say that \(q\) is the quotient and \(r\) is the remainder. For example, if \(a=17\) and \(b=5\text{,}\) then
\begin{equation*} 17 = 5\cdot3 + 2\text{.} \end{equation*}
We might say β€œ17 divided by 5 has a quotient of 3 and a remainder of 2.”
Polynomials satisfy a similar property also called the Division Algorithm as given in the following proposition.

Example 1.4.4.

Rather than proving the Division Algorithm for polynomials, we will illustrate with an example. Suppose that \(p(x)=6x^3 + x^2 - 17x + 8\) and \(s(x)=2x^2 + 2x - 1\text{.}\)
To get started, let’s consider the highest degree terms in both polynomials. For \(p(x)\text{,}\) it is \(6x^3\) while for \(s(x)\text{,}\) it is \(2x^2\text{.}\) We’d like to multiply \(s(x)\) by something to get close to \(p(x)\text{.}\) Since the ratio of the highest degree terms is \(3x\text{,}\) we multiply \(s(x)\) by \(3x\text{,}\) which gives
\begin{equation*} p(x) - (3x)s(x) = -8x^2 - 14x + 8\text{.} \end{equation*}
Now we ask what we should multiply by \(s(x)\) to realize the \(-8x^2\) term. Since the highest degree term of \(s(x)\) is \(2x^2\text{,}\) we multiply by \(-4\text{.}\) This means that
\begin{equation*} p(x) - (3x-4)s(x) = -2x+4\text{.} \end{equation*}
Since the highest degree term of \(s(x)\) is \(2x^2\text{,}\) there is nothing we can multiply by \(s(x)\) to realize the highest degree term of the right hand side. This means that we have
\begin{equation*} p(x) = (3x-4)s(x) + (-2x+4) \end{equation*}
so the quotient is \(q(x)=3x-4\) and the remainder is \(r(x)=-2x+4\text{.}\)
This algorithm is sometimes called synthetic division and is sometimes taught in high school. We won’t be actually be implementing this algorithm, but this example is meant to illustrate why PropositionΒ 1.4.3 is true. Implementing this algorithm for general polynomials will give a proof of the Division Algorithm.
One important implication of the Division Algorithm is the following fact that we will use quite a bit. Remember that a root \(\lambda\) of a polynomial is a value for which \(p(\lambda)=0\text{.}\)

Proof.

The Division Algorithm tells us that
\begin{equation*} p(x)=(x-\lambda)q(x) + r(x)\text{.} \end{equation*}
In the notation of the Division Algorithm, \(s(x)=x-\lambda\) and \(\deg(r) \lt \deg(s) = 1\text{.}\) Therefore \(\deg(r)=0\text{,}\) which means that \(r(x)\) is a constant \(r\text{.}\)
Now we know that \(0=p(\lambda)=(\lambda-\lambda)q(\lambda)+r = r\text{,}\) which says that \(r=0\text{.}\) Therefore, we have
\begin{equation*} p(x)=(x-\lambda)q(x)\text{.} \end{equation*}
This proposition shows that there is a relationship between the factors of a polynomial \(p(x)\) and its roots. Polynomials with complex coefficients are special because the Fundamental Theorem of Algebra tells us that every such polynomial has a root. The proof of this theorem is outside the scope of our course, but is is usually included in a course in complex analysis.
While this important theorem only holds for polynomials with complex coefficients, we can deduce something about polynomials with real coefficients. For instance, if \(p(x)\) is a polynomial with real coefficients and \(\lambda\) is a complex, non-real root, then so is its complex conjugate \(\overline{\lambda}\text{.}\) This follows because
\begin{equation*} 0=\overline{p(\lambda)} = p(\overline{\lambda})\text{.} \end{equation*}
We have \((x-\lambda)(x-\overline{\lambda}) = x^2+bx + c\) where \(b = -(\lambda+\overline{\lambda})\) and \(c=\lambda\overline{\lambda}\text{.}\) Since this quadratic polynomial does not have real roots, we know from the quadratic formula that \(b^2\lt 4c\text{.}\) Therefore,
As mentioned in the introduction to this section, our interest in polynomials stems from the insights we gain when we evaluate a polynomial on an operator \(T\text{.}\) For instance, if \(p(x)=x^2+4x-5\text{,}\) then \(p(T) = T^2 + 4T - 5I\) a new operator. Notice that we consider the constant term to be multiplied by the identify transformation \(I\text{.}\)

Example 1.4.8.

Consider the operator \(T:\real^2\to\real^2\) by \(T(\xvec) = A\xvec\) where \(A=\begin{bmatrix} 2 \amp -1 \\ 2 \amp 3 \end{bmatrix} \text{.}\) If \(p(x)=x^2+4x-5\text{,}\) then
\begin{equation*} p(T)\xvec = (A^2 + 4A -5I)\xvec = \begin{bmatrix} 5 \amp -9 \\ 18 \amp 14 \\ \end{bmatrix} \xvec\text{.} \end{equation*}
This means that \(p(T):\real^2\to\real^2\) is the operator represented by the matrix \(\begin{bmatrix} 5 \amp -9 \\ 18 \amp 14 \\ \end{bmatrix} \text{.}\)

Subsection 1.4.2 Invariant subspaces

As we begin to study operators, the concept of an invariant subspace will be helpful. In particular, finding invariant subspaces will frequently help us reduce our focus from an entire vector space to a smaller subspace so that we can more easily draw conclusions.

Definition 1.4.9.

If \(W\) is a vector subspace of \(V\) and \(T:V\to V\) an operator, we say that \(W\) is invariant under \(T\) if \(T(\wvec)\in W\) for all \(\wvec\in W\text{.}\) We frequently say that \(W\) is \(T\)-invariant.

Example 1.4.10.

If \(T\) is an operator and \(T\vvec = \lambda\vvec\text{,}\) we say that \(\vvec\) is an eigenvector of \(T\) with associated eigenvalue \(\lambda\text{.}\) This is the same concept that we explored with matrices.
The eigensapce associated to the eigenvalue \(\lambda\text{,}\) \(W=E_\lambda\text{,}\) which consists of all the eigenvectors of \(T\) is a \(T\)-invariant subspace. In particular, if \(\vvec\) is in \(E_\lambda\text{,}\) then \(T\vvec = \lambda\vvec\text{,}\) which is also in \(E_\lambda\text{.}\)

Example 1.4.11.

Consider the operator \(T:\real^3\to\real^3\text{,}\) which rotates vectors by \(90^\circ\) around the \(z\)-axis. The \(z\)-axis forms an invariant subspace since it is equal to \(E_1\text{.}\)
Moreover, the set of vectors \(\threevec xy0\) forms a two-dimensional \(T\)-invariant subspace since any vector in that plane is rotated to another vector in the same plane.

Example 1.4.12.

Suppose that \(V=\pbb_8\text{,}\) the vector space of polynomials of degree 8 or less, and that \(T:V\to V\) by \(T(p) = p'\text{.}\) Then \(W=\pbb_5\) is a \(T\)-invariant subspace because the derivative of any degree five polynomial has a degree less than five.

Example 1.4.13.

Suppose that \(T:V\to V\) is an operator. Then \(\nul(T)\) is a \(T\)-invariant subspace. For example, if \(\vvec\) is in \(\nul(T)\text{,}\) then \(T\vvec=0\text{,}\) which is also in \(\nul(T)\text{.}\)
Furthermore, \(\range(T)\) is also \(T\)-invariant. If \(\wvec\in\range(T)\text{,}\) then \(T\wvec\) is in \(\range(T)\) by the definition of \(\range(T)\text{.}\)

Example 1.4.14.

Any operator \(T:V\to V\) will have two invariant subspaces, namely \(\{0\}\) and \(V\) itself.
Remember that the order in which we multiply polynomials is not important. In particular, \(p(T)q(T)=q(T)p(T)\text{.}\) This leads to the following important proposition.

Proof.

Consider the polynomial \(q(x)=x\) so that \((pq)(x)=xp(x)=p(x)x\text{.}\) This means that \(p(T)T=Tp(T)\text{.}\)
Suppose that \(\vvec\) is in \(\nul(p(T))\text{,}\) which means that \(p(T)\vvec = \zerovec\text{.}\) We need to explain why \(T\vvec\) is also in \(\nul(p(T))\text{,}\) which leads us to
\begin{equation*} p(T)T\vvec = Tp(T)\vvec = T\zerovec = \zerovec\text{.} \end{equation*}
Similarly, if \(\vvec\) is in \(\range(p(T))\text{,}\) then there is a vector \(\uvec\) so that \(\vvec=p(T)\uvec\text{.}\) Then
\begin{equation*} T\vvec = Tp(T)\uvec = p(T)(T\uvec)\text{,} \end{equation*}
which shows that \(T\vvec\) is also in \(\range(p(T))\text{.}\) Therefore, \(\range(p(T))\) is an invariant subspace of \(T\text{.}\)

Example 1.4.16.

Consider the operator \(T:\real^4\to\real^4\) by \(T\xvec = A\xvec\) with \(A=\begin{bmatrix} -10 \amp 16 \amp 5 \amp -9 \\ -6 \amp 10 \amp 1 \amp -5 \\ -2 \amp 4 \amp 1 \amp -5 \\ 2 \amp -4 \amp -3 \amp 3 \\ \end{bmatrix} \text{.}\) Consider also the polynomial \(p(x) = x^2+4x+4\) so that the operator \(p(T)\) is represented by the matrix
\begin{equation*} B = A^2 + 4A+4I = \begin{bmatrix} -60 \amp 120 \amp 18 \amp -78 \\ -36 \amp 72 \amp 0 \amp -36 \\ -24 \amp 48 \amp 18 \amp -42 \\ 24 \amp -48 \amp -18 \amp 42 \end{bmatrix}\text{.} \end{equation*}
Notice that the reduced row echelon form of \(B\) is
\begin{equation*} B\sim\begin{bmatrix} 1 \amp -2 \amp 0 \amp 1 \\ 0 \amp 0 \amp 1 \amp -1 \\ 0 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0 \\ \end{bmatrix}\text{.} \end{equation*}
First considering \(\range(p(T))=\col(B)\text{,}\) we find a basis consisting of the first and third columns of \(B\text{:}\)
\begin{equation*} \wvec_1 = \fourvec{-60}{-36}{-24}{24},\hspace{12pt} \wvec_2 = \fourvec{18}{0}{18}{18}\text{.} \end{equation*}
We can check that
\begin{align*} T\wvec_1 \amp = 4\wvec_1 + 4\wvec_2\\ T\wvec_2 \amp = 4\wvec_2\text{.} \end{align*}
This shows that \(\range(p(T))\) is \(T\)-invariant since \(T\wvec_1\) and \(T\wvec_2\) lie in \(\range(p(T))\text{.}\)
We also have a basis for \(\nul(p(T))=\nul(B)\) consisting of vectors
\begin{equation*} \uvec_1 = \fourvec{2}{1}{0}{0},\hspace{12pt} \uvec_2 = \fourvec{-1}{0}{1}{1}\text{.} \end{equation*}
We can check that
\begin{align*} T\uvec_1 \amp = -2\uvec_1\\ T\uvec_2 \amp = -2\uvec_1-2\uvec_2\text{.} \end{align*}
Once again, this shows that \(\nul(p(T))\) is \(T\)-invariant.

Subsection 1.4.3 The minimal polynomial of a vector

We are approaching a result that will be crucial for our upcoming explorations. First, we need to make the following definition clear.

Definition 1.4.17.

A monic polynomial is a polynomial where the coefficient of the highest degree term is \(1\text{.}\)
For instance, \(x^5 - x^4 + 3x - 2\) is a monic polynomial, but \(3x^3+2x-1\) is not.

Proof.

Consider the sequence of vectors formed by repeatedly applying \(T\) to \(\vvec\text{:}\)
\begin{equation*} \vvec, T\vvec, T^2\vvec, T^3\vvec,\ldots\text{.} \end{equation*}
Because \(V\) is finite dimensional, this set of vectors will eventually become linearly dependent. Suppose that \(m\) is the smallest integer for which \(T^m\) is a linear combination of the earlier vectors in the list. This means that
\begin{align*} T^m\vvec \amp = -a_0\vvec -a_1T\vvec - \ldots - a_{m-1}T^{m-1}\vvec\\ 0 \amp = (a_0I + a_1T + \ldots + a_{m-1}T^{m-1} + T^m)\vvec\text{.} \end{align*}
Therefore, \(p(x) = a_0 + a_1x + \ldots + a_{m-1}x^{m-1}+x^m\) is a monic polynomial for which \(p(T)\vvec= 0\text{.}\)
Since the vectors \(\vvec,T\vvec,\ldots,T^{m-1}\vvec\) are linearly independent, there can be no polynomial \(s\) whose degree is smaller than \(m\) such that \(s(T)\vvec=0\text{.}\)
Moreover, if there is another monic polynomial \(r\) of degree \(m\) such that \(r(T)\vvec = 0\text{,}\) then \(p-r\) is a polynomial whose degree is smaller than \(m\) and that satisfies \((p-r)(T)\vvec = 0\text{,}\) but the argument of the previous paragraph shows that this cannot happen. Therefore, \(p=r\text{,}\) which shows that \(p\) is the unique monic polynomial of smallest degree satisfying \(p(T)\vvec=0\text{.}\)
We must have \(\deg(p_\vvec)\leq \dim V\) since if \(\dim V = n\text{,}\) then the set of vectors \(\vvec,\ldots,T^n\vvec\) has \(n+1\) vectors and hence must be linearly dependent.

Definition 1.4.19.

The polynomial \(p_\vvec\) is sometimes called the annihilating polynomial of \(\vvec\text{,}\) since \(p_\vvec(T)\vvec = 0\text{,}\) or the minimal polynomial of \(T\) applied to \(\vvec\text{.}\)

Example 1.4.20.

We will revisit ExampleΒ 1.4.16 and the operator \(T:\real^4\to\real^4\) defined by \(T\xvec=A\xvec\) where \(A=\begin{bmatrix} -10 \amp 16 \amp 5 \amp -9 \\ -6 \amp 10 \amp 1 \amp -5 \\ -2 \amp 4 \amp 1 \amp -5 \\ 2 \amp -4 \amp -3 \amp 3 \\ \end{bmatrix} \text{.}\) Suppose that \(\vvec=\fourvec1111\) and consider the vectors
\begin{equation*} \vvec, T\vvec, \ldots, T^5\vvec\text{.} \end{equation*}
Since this is a set of five vectors in \(\real^4\text{,}\) we are guaranteed that this set is linearly dependent. Putting these five vectors into a matrix and row reducing, we have
\begin{equation*} \begin{bmatrix} \vvec \amp T\vvec \amp T^2\vvec \amp T^3\vvec \amp T^4\vvec \end{bmatrix} \sim \begin{bmatrix} 1 \amp 0 \amp -4 \amp 16 \amp -48 \amp 128 \\ 0 \amp 1 \amp -4 \amp 12 \amp -32 \amp 80 \\ 0 \amp 0 \amp 0 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0 \amp 0 \amp 0 \end{bmatrix}\text{.} \end{equation*}
This shows, in particular, that \(T^2\vvec = -4\vvec - 4T\vvec\) so that
\begin{equation*} (T^2 + 4T+4I)\vvec = 0\text{.} \end{equation*}
Since \(\vvec\) and \(T\vvec\) are linearly independent, the minimal polynomial \(p_\vvec\) cannot have degree one or less. Therefore,
\begin{equation*} p_\vvec(x) = x^2 + 4x + 4\text{.} \end{equation*}

Example 1.4.21.

If \(\vvec\) is an eigenvector of \(T\) with associated eigenvalue \(\lambda\text{,}\) then
\begin{equation*} (T-\lambda I)\vvec = 0 \end{equation*}
so that \(p_\vvec(x) = x-\lambda\text{.}\)
The construction in this proof will be important when we investigate numerical linear algebra later on this course.

Definition 1.4.22. Krylov subspace.

Given an operator \(T\) on a finite dimensional vector space \(V\) and a vector \(\vvec\text{,}\) we define
\begin{equation*} \kcal_m(T,\vvec) = \laspan{\vvec,T\vvec,\ldots,T^{m-1}\vvec}\text{,} \end{equation*}
the \(m^{th}\) Krylov subspace.
We see that
\begin{equation*} \kcal_1\subset\kcal_2\subset\kcal_3\subset\ldots\subset\kcal_m = \kcal_{m+1} = \ldots = \kcal_{m+k} \end{equation*}
so that the set of inclusions eventually stablizes when \(m\) reaches the degree of the minimal polynomial of \(\vvec\text{.}\) This integer \(m\) is sometimes called the grade of \(\vvec\text{.}\) When the grade of a vector is one, that vector must be an eigenvector of \(T\text{.}\)

Subsection 1.4.4 The minimal polynomial of an operator

While we will see that the minimal polynomial of a vector \(p_\vvec\) is useful, we can take this idea a bit further and find a polynomial \(p\) that annihilates every vector in \(V\text{.}\) This will be called the minimal polynomial of the operator \(T\text{.}\) Rather than starting with a proof of the theorem, let’s illustrate the idea by continuing our earlier example from ExampleΒ 1.4.20.

Example 1.4.23. The minimal polynomial of an operator.

Remember the setup from ExampleΒ 1.4.20. We have the operator \(T:\real^4\to\real^4\) defined by \(T\xvec=A\xvec\) where \(A=\begin{bmatrix} -10 \amp 16 \amp 5 \amp -9 \\ -6 \amp 10 \amp 1 \amp -5 \\ -2 \amp 4 \amp 1 \amp -5 \\ 2 \amp -4 \amp -3 \amp 3 \\ \end{bmatrix} \text{.}\) With the vector \(\vvec=\fourvec1111\text{,}\) we found that \(p_\vvec(x) = x^2 +4x+4=(x+2)^2\text{.}\)
We know that \(p_\vvec(T)\) annihilates \(\vvec\text{,}\) meaning that \(p_\vvec(T)\vvec=0\text{.}\) But \(p_\vvec(T)\) also annihilates \(T\vvec\) since
\begin{equation*} p_\vvec(T)T\vvec = Tp_\vvec(T)\vvec = 0\text{,} \end{equation*}
which means that \(p_\vvec(T)\) annihilates any vector in the two-dimensional Krylov subspace \(\kcal_2(T,\vvec)\text{.}\) We would like to find a polynomial that annihilates every vector in \(V\text{.}\)
To do this, will consider the subspace \(U=\range(p_\vvec(T))\text{.}\) By PropositionΒ 1.4.15, \(U\) is a \(T\)-invariant subspace of \(\real^4\text{.}\) In fact, we can find a basis for \(U\text{.}\) We see that the matrix representing \(p_\vvec(T)\) is
\begin{equation*} p_\vvec(A) = \begin{bmatrix} -60 \amp 120 \amp 18 \amp -78 \\ -36 \amp 72 \amp 0 \amp -36 \\ -24 \amp 48 \amp 18 \amp -42 \\ 24 \amp -48 \amp -18 \amp 42 \end{bmatrix} \sim \begin{bmatrix} 1 \amp -2 \amp 0 \amp 1 \\ 0 \amp 0 \amp 1 \amp -1 \\ 0 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0 \end{bmatrix}\text{.} \end{equation*}
This shows that a basis for \(\range(p_\vvec(T)) = \col(p_\vvec(A))\) is
\begin{equation*} \uvec_1 = \fourvec{-60}{-36}{-24}{24},\hspace{12pt} \uvec_2 = \fourvec{18}{0}{18}{-18}\text{.} \end{equation*}
Now that we know that \(U\) is a \(T\)-invariant subspace and we find that
\begin{align*} T\uvec_1 \amp = 4\uvec_1 - 4 \uvec_2\\ T\uvec_1 \amp = 4 \uvec_2\text{.} \end{align*}
By \(T|_U\text{,}\) we mean the operator \(T\) restricted to \(U\text{;}\) that is \(T|_U:U\to U\) by \(T|_U(\uvec) = T\uvec\text{.}\) If we call the basis for \(U\) \(\bcal=\{\uvec_1,\uvec_2\}\text{,}\) we have
\begin{equation*} \coords{T|_U}{\bcal} = \begin{bmatrix} 4 \amp -4 \\ 0 \amp 3 \\ \end{bmatrix}\text{.} \end{equation*}
Now we will choose a vector \(\uvec=\twovec11\) (think of this as being randomly chosen) and find the annihilating polynomial \(p_\uvec\) for \(T|_U\text{.}\) We have
\begin{equation*} \begin{bmatrix} \uvec \amp T|_U\uvec \amp T|_U^2\uvec \end{bmatrix} \sim \begin{bmatrix} 1 \amp 0 \amp -16 \\ 0 \amp 1 \amp 8 \\ \end{bmatrix}\text{,} \end{equation*}
which means that
\begin{equation*} (T|_U^2-8T|_U + 16I)\uvec = \zerovec\text{.} \end{equation*}
In other words, \(p_\uvec(x) = x^2 - 8x+16=(x-4)^2\) is the annihilating polynomial of \(\uvec\) for the operator \(T|_U\text{.}\)
In fact, this polynomial annihilates every vector in \(U\) because we find that
\begin{equation*} T|_U^2 - 8T|_U + 16I = 0\text{.} \end{equation*}
Notice what happens if we multiply
\begin{equation*} p(x) = p_\uvec(x)p_\vvec(x) = (x-4)^2(x+2)^2\text{.} \end{equation*}
Suppose that \(\wvec\) is any vector in \(V\text{.}\) Then
\begin{equation*} p(T)\wvec = p_\uvec(T)p_\vvec(T)\wvec\text{.} \end{equation*}
But notice that \(p_\vvec(T)\wvec\) is in \(U=\range(p_\vvec(T))\) and that \(p_\uvec(T)\) annihilates every vector in \(U\text{.}\) This means that \(p(T)\wvec = 0\) so that \(p(T)\) annihilates every vector in \(V\text{.}\)
In fact, this polynomial is what we will call the minimal polynomial of \(T\text{.}\) This example illustrates the proof of the theorem that we are about to give.

Proof.

Our proof proceeds by induction on the dimension of \(V\text{.}\) To begin, suppose that \(\dim(V) = 1\text{,}\) which means that \(V=\laspan{\vvec}\) for some vector \(\vvec\text{.}\) Then \(T\vvec = \lambda\vvec\) for some \(\lambda\text{,}\) which is possibly \(0\text{.}\) Then \((T-\lambda I)\vvec = \zerovec\text{,}\) which means that \(T-\lambda I = 0\) since \(\vvec\) spans \(V\text{.}\) Therefore, if \(p(x) = x-\lambda\text{,}\) we have \(p(T)=0\text{.}\)
Now suppose that \(\dim(V)=n\) and that the theorem has been verified for all operators on vector spaces of dimension less than \(n\text{.}\) Following our work in ExampleΒ 1.4.23, we choose a nonzero vector \(\vvec\) and find its minimal polynomial \(p_\vvec\text{.}\) This polynomial will have degree \(m\geq 1\text{.}\)
Since \(\nul(p_\vvec(T))\) is \(T\)-invariant and \(\vvec\) is in \(\nul(p_\vvec(T))\text{,}\) it follows that
\begin{equation*} \vvec, T\vvec, T^2\vvec, \ldots, T^{m-1}\vvec \end{equation*}
are all in \(\nul(p_\vvec(T))\text{.}\) Since these vectors are linearly independent, we have
\begin{equation*} \dim\nul(p_\vvec(T)) \geq m \geq 1\text{.} \end{equation*}
Now we focus on \(U=\range(p_\vvec(T))\text{,}\) which we know is \(T\)-invariant by PropositionΒ 1.4.15. Moreover, by the Fundamental Theorem of Linear Maps, we know that \(U\)
\begin{equation*} \dim(\range(p(T))) = \dim(V) - \dim(\nul(p(T))) \leq n - m \lt n\text{.} \end{equation*}
This says that the induction hypothesis applies to the operator \(T|_U\) so that there is a minimal polynomial, which we will denote by \(p_U\) so that \(p_U(T|_U) = 0\text{.}\) Moreover, by the induction hypothesis, this polynomial is monic and \(\deg(p_U) \leq \dim U \leq n - m\text{.}\)
We call \(p = p_Up_\vvec\) and note that
\begin{equation*} \deg(p)=\deg(p_U)+\deg(p_\vvec) \leq n-m + m = n=\dim V\text{.} \end{equation*}
Moreover, if \(\wvec\) is any vector in \(V\text{,}\) it follows that \(p_\vvec(T)\wvec\) is a vector in \(U=\range(p_\vvec)\) so that
\begin{equation*} p(T)\wvec = p_U(T)p_\vvec(T)\wvec = 0\text{.} \end{equation*}
That is, \(p(T)\) annihilates every vector in \(V\) so that \(p(T)=0\text{.}\)
This shows that there is a monic polynomial \(p\) such that \(p(T)=0\) on \(V\) and \(\deg(p) \leq \dim V\text{.}\) Therefore, there is some possibly different polynomial having the smallest degree among all such polynomials, and this is the minimal polynomial of the operator \(T\) on \(V\text{.}\)
To see that this polynomial is unique, suppose there are two monic polynomials \(s_1\) and \(s_2\) having smallest degree and \(s_1(T)=0\) and \(s_2(T)=0\text{.}\) If we consider \(s_1-s_2\text{,}\) we see that \(\deg(s_1-s_2)\lt \deg(s_1)=\deg(s_2)\) since the highest degree terms of \(s_1\) and \(s_2\) have coefficients \(1\) and therefore cancel. Also,
\begin{equation*} (s_1-s_2)(T) = s_1(T) - s_2(T) = 0\text{.} \end{equation*}
However, this is impossible since \(s_1\) and \(s_2\) had the smallest possible degree among all polynomials that vanish when evaluated on \(T\text{.}\) This means that \(s_1=s_2\text{,}\) which guarantees the uniqueness of the minimal polynomial.

Example 1.4.25.

Returning to our earlier example where \(T:\real^4\to\real^4\) by \(T\xvec = A\xvec\) with \(A=\begin{bmatrix} -10 \amp 16 \amp 5 \amp -9 \\ -6 \amp 10 \amp 1 \amp -5 \\ -2 \amp 4 \amp 1 \amp -5 \\ 2 \amp -4 \amp -3 \amp 3 \\ \end{bmatrix} \text{.}\) If we choose \(\vvec=\fourvec1010\text{.}\) We have
\begin{equation*} \begin{bmatrix} \vvec \amp T\vvec \amp T^2\vvec \amp T^3\vvec \amp T^4\vvec \end{bmatrix} \sim \begin{bmatrix} 1 \amp 0 \amp 0 \amp 0 \amp -64 \\ 0 \amp 1 \amp 0 \amp 0 \amp -32 \\ 0 \amp 0 \amp 1 \amp 0 \amp 12 \\ 0 \amp 0 \amp 0 \amp 1 \amp 4 \end{bmatrix}\text{.} \end{equation*}
This says that \(p_\vvec(x) = x^4-4x^3-12x^2+32x+64\text{.}\) One can now check that \(p_\vvec(T) = 0\) so that the minimal polynomial of \(T\) is in fact
\begin{equation*} p(x) = x^4-4x^3-12x^2+32x+64=(x+2)^2(x-4)^2\text{.} \end{equation*}
This is, of course, the same polynomial we found earlier. For most vectors \(\vvec\text{,}\) we will find that the minimal polynomial of the operator \(T\) is \(p_\vvec\text{.}\)

Example 1.4.26.

Consider the \(2\times2\) matrix \(A=\begin{bmatrix} 2 \amp 0 \\ 0 \amp 2 \end{bmatrix}\) and the linear operator \(T\) that it defines. Notice that \(A-2I = 0\) so if \(p(x)=x-2\text{,}\) then \(p(T)=0\text{.}\) The minimal polynomial of \(T\) is therefore \(p(x)=x-2\text{.}\)
More generally, suppose that the minimal polynomial of an operator \(T\) has degree \(1\text{.}\) Since the minimal polynomial is monic, this means that \(p(x)=x-\lambda\text{.}\) Because \(p(T)=T-\lambda I = 0\text{,}\) we see that \(T=\lambda I\text{,}\) a multiple of the identity.

Example 1.4.27.

By contrast, consider the \(2\times2\) matrix \(B=\begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix}\) and the linear operator \(S\) that it defines with respect to some basis. The degree of the minimal polynomial must be at least 2 since \(B\) is not a multiple of the identity matrix. Notice, however, that \(B-2I = \begin{bmatrix} 0 & 1 \\ 0 & 0 \\ \end{bmatrix}\) and \((B-2I)^2 = 0\text{.}\) This says that \((S-2I)^2 = 0\) and so the minimal polynomial of \(S\) is \(q(x)=(x-2)^2\text{.}\)
Both of the matrices in the two previous examples are upper triangular. Remembering that the eigenvalues of an upper triangular matrix are the entries on the diagonal, we see that the roots of the minimal polynomials in both cases are the eigenvalues of the operator. This gives some indication of the importance of the minimal polynomial, as we will see now.
The fact that the minimal polynomial has the smallest degree among all polynomials for which \(p(T)=0\) has some important consequences.

Proof.

Suppose that \(p\) is the minimal polynomial of \(T\text{.}\) We need to explain two things: that any eigenvalue of \(T\) is a root of \(p\) and that any root of \(p\) is an eigenvalue of \(T\text{.}\)
Suppose that \(\lambda\) is an eigenvalue of \(T\text{.}\) This means that there is a nonzero vector \(\vvec\) such that \(T\vvec = \lambda\vvec\) and therefore \(T^j\vvec = \lambda^j\vvec\) for every \(j\text{.}\) This means that
\begin{equation*} 0 = p(T)\vvec = p(\lambda)\vvec\text{,} \end{equation*}
which implies that \(p(\lambda) = 0\text{.}\) Therefore, \(\lambda\) is a root of \(p\text{,}\) the minimal polynomial of \(T\text{.}\)
Conversely, suppose that \(\lambda\) is a root of \(p\text{.}\) By PropositionΒ 1.4.5, this means that
\begin{equation*} p(x) = (x-\lambda)q(x)\text{.} \end{equation*}
This says that
\begin{equation*} 0 = p(T) = (T-\lambda I)q(T) \end{equation*}
However, \(q(T)\neq 0\) since \(\deg(q) \lt \deg(p)\text{,}\) which implies there is some vector \(\vvec\) for which \(q(T)\vvec\neq 0\text{.}\) Therefore,
\begin{equation*} 0 = p(T)\vvec = (T-\lambda I)q(T)\vvec\text{,} \end{equation*}
which shows that \(q(T)\vvec\) is an eigenvector \(T\) with associated eigenvalue \(\lambda\text{.}\)
This is the most significant fact about the minimal polynomial: that its roots are the eigenvalues of the operator. We’ll put this to use in the next section. Before that, however, here are two other useful facts.

Proof.

If \(p\) is the minimal polynomial of \(T\text{,}\) then we can apply the Division Algorithm to write \(s = pq + r\) where \(\deg(r) \lt \deg(p)\text{.}\) Furthermore,
\begin{equation*} 0 = s(T) = p(T)q(T) + r(T) = r(T)\text{,} \end{equation*}
which implies that \(r(T)=0\text{.}\) Since \(p\) has the smallest degree among all polynomials that vanish when evaluated on \(T\) and \(r\) has a smaller degree than \(p\text{,}\) we know that \(r=0\text{.}\) Therefore, \(s=pq\text{.}\)

Proof.

Suppose that \(s\) is the minimal polynomial of \(T|_W\) and \(p\) is the minimal polynomial of \(T\) on \(V\text{.}\) This means that \(p(T)\wvec=0\) for every vector \(\wvec\) in \(W\) and so \(p(T|_W) = 0\text{.}\) We also know that \(s\) is the minimal polynomial of \(T|_W\) so by PropositionΒ 1.4.29, we know that \(p\) is a multiple of \(s\text{.}\)