Skip to main content

Section 1.4 The minimal polynomial

This section will describe some important properties about polynomials and demonstrate how they might help us understand linear transformations.
We will begin by studying familiar polynomials, such as you saw in high school algebra. For example, \(p(x)=2x^2 - 3x + 7\) is a quadratic polynomial. Eventually, we will consider polynomials evaluated on an operator \(T\text{.}\) For instance, evaluating \(p(x)\) on the operator \(T\) produces a new operator \(p(T) = 2T^2 - 3T + 7I\text{.}\) As a preview for how this may be useful, the polynomial \(p(x)=x-2\) leads to the operator \(p(T) = T-2I\) whose null space is the set of eigenvectors associated to the eigenvalue \(\lambda = 2\text{.}\)

Subsection 1.4.1 Properties of Polynomials

We will be interested in polynomials that depend on a single variable, which will often be \(x\text{.}\) A general polynomial has the form
\begin{equation*} p(x)=a_nx^n + a_{n-1}x^{n-1}+\ldots + a_1x+a_0 \end{equation*}
where \(a_j\) are the coefficients of \(p(x)\) and are elements in either \(\field=\C\) or \(\real\text{.}\)
The degree of a polynomial is the highest power of \(x\) whose coefficient is nonzero. For example, the degree of
\begin{equation*} p(x)=-2x^7 + 4x^6 -x^5 + 4x^3-3x + 1 \end{equation*}
is \(\deg(p) = 7\text{.}\)
We have already seen that the set of polynomials form a vector space, which means we can multiply polynomials by scalars and we can add them. An additional operation allows us to multiply two polynomials together.

Example 1.4.1.

Suppose that \(p(x)=3x^2-4x+2\) and \(q(x)=8x-5\text{.}\) We multiply polynomials term by term so that
\begin{equation*} (pq)(x) = 24x^3-27x^2+36x-10 \end{equation*}
More generally, denoting the coefficients of \(p\) by \(a_j\) and the coefficients of \(q\) by \(b_j\text{,}\) the coefficients of \(pq\) are
\begin{equation*} c_k=\sum_{j=0}^k a_jb_{k-j}=\sum_{j=0}^k b_ja_{k-j}\text{.} \end{equation*}

Properties of polynomial multiplication.

Two important properties follow from the general expression for the coefficients of the product given in Example 1.4.1. First, the order in which we multiply polynomials is irrelevant; that is,
\begin{equation*} pq = qp\text{.} \end{equation*}
Second, the degree of the product is the sum of the degrees; that is,
\begin{equation*} \deg(pq)=\deg(p)+\deg(q)\text{.} \end{equation*}
Integers satisfy a property often called the Division Algorithm. Suppose that \(a\) and \(b\) are positive integers. Then there are integers \(q\) and \(r\) such that
\begin{equation*} a = bq + r \end{equation*}
where \(0\leq r \leq b-1\text{.}\) We say that \(q\) is the quotient and \(r\) is the remainder. For example, if \(a=17\) and \(b=5\text{,}\) then
\begin{equation*} 17 = 3\cdot5 + 2\text{.} \end{equation*}
Polynomials satisfy a similar property also called the Division Algorithm as given in the following proposition.

Example 1.4.3.

Rather than proving the Division Algorithm for polynomials, we will illustrate with an example. Suppose that \(p(x)=6x^3 + x^2 - 17x + 8\) and \(s(x)=2x^2 + 2x - 1\text{.}\)
To get started, let’s consider the highest degree terms in both polynomials. For \(p(x)\text{,}\) it is \(6x^3\) while for \(s(x)\text{,}\) it is \(2x^2\text{.}\) We’d like to multiply \(s(x)\) by something to get close to \(p(x)\text{.}\) Since the ratio of the highest degree terms is \(3x\text{,}\) we multiply \(s(x)\) by \(3x\text{,}\) which gives
\begin{equation*} p(x) - (3x)s(x) = -8x^2 - 14x + 8\text{.} \end{equation*}
Now we ask what we should multiply by \(s(x)\) to realize the \(-8x^2\) term. Since the highest degree term of \(s(x)\) is \(2x^2\text{,}\) we multiply by \(-4\text{.}\) This means that
\begin{equation*} p(x) - (3x-4)s(x) = -2x+4\text{.} \end{equation*}
Since the highest degree term of \(s(x)\) is \(2x^2\text{,}\) there is nothing we can multiply by \(s(x)\) to realize the highest degree term of the right hand side. This means that we have
\begin{equation*} p(x) = (3x-4)s(x) + (-2x+4) \end{equation*}
so the quotient is \(q(x)=3x-4\) and the remainder is \(r(x)=-2x+4\text{.}\)
This algorithm is sometimes called synthetic division and is sometimes taught in high school. We won’t be actually be implementing this algorithm, but this example is meant to illustrate why Proposition 1.4.2 is true. Implementing this algorithm for general polynomials will give a proof of the Division Algorithm.
One important implication of the Division Algorithm is the following fact that we will use quite a bit. Remember that a root \(\lambda\) of a polynomial is a value for which \(p(\lambda)=0\text{.}\)

Proof.

The Division Algorithm tells us that
\begin{equation*} p(x)=(x-\lambda)q(x) + r(x)\text{.} \end{equation*}
In the notation of the Division Algorithm, \(s(x)=x-\lambda\) and \(\deg(r) \lt \deg(s) = 1\text{.}\) Therefore \(\deg(r)=0\text{,}\) which means that \(r(x)\) is a constant \(r\text{.}\)
Now we know that \(0=p(\lambda)=(\lambda-\lambda)q(\lambda)+r = r\text{,}\) which says that \(r=0\text{.}\) Therefore, we have
\begin{equation*} p(x)=(x-\lambda)q(x)\text{.} \end{equation*}
This proposition shows that there is a relationship between the factors of a polynomial \(p(x)\) and its roots. Polynomials with complex coefficients are special because the Fundamental Theorem of Algebra tells us that every such polynomial has a root. The proof of this theorem is outside the scope of our course, but students often see it in a course in complex analysis.
While this important theorem only holds for polynomials with complex coefficients, we can deduce something about polynomials with real coefficients. For instance, if \(p(x)\) is a polynomial with real coefficients and \(\lambda\) is a complex, non-real root, then so is its complex conjugate \(\overline{\lambda}\text{.}\) This follows because
\begin{equation*} 0=\overline{p(\lambda)} = p(\overline{\lambda})\text{.} \end{equation*}
We have \((x-\lambda)(x-\overline{\lambda}) = x^2+bx + c\) where \(b = -(\lambda+\overline{\lambda})\) and \(c=\lambda\overline{\lambda}\text{.}\) Since this quadratic polynomial does not have real roots, we know from the quadratic formula that \(b^2\lt 4ac\text{.}\) Therefore,

Subsection 1.4.2 The minimal polynomial of an operator

As mentioned in the introduction to this section, our interest in polynomials stems from the insights we gain when we evaluate a polynomial on an operator \(T\text{.}\) For instance, if \(p(x)=x^2+4x-5\text{,}\) then \(p(T) = T^2 + 4T - 5I\) a new operator. Notice that we consider the constant term to be multiplied by the identify transformation \(I\text{.}\)

Definition 1.4.7.

If \(W\) is a vector subspace of \(V\) and \(T:V\to V\) an operator, we say that \(W\) is invariant under \(T\) if \(T(\wvec)\in W\) for all \(\wvec\in W\text{.}\)
Remember that the order in which we multiply polynomials is not important. In particular, \(p(T)q(T)=q(T)p(T)\text{.}\) This leads to the following important proposition.

Proof.

Suppose that \(\vvec\) is in \(\nul(p(T))\text{,}\) which means that \(p(T)\vvec = \zerovec\text{.}\) We need to explain why \(T\vvec\) is also in \(\nul(p(T))\text{,}\) which leads us to
\begin{equation*} p(T)T\vvec = Tp(T)\vvec = T\zerovec = \zerovec\text{.} \end{equation*}
Similarly, if \(\vvec\) is in \(\range(p(T))\text{,}\) then there is a vector \(\uvec\) so that \(\vvec=p(T)\uvec\text{.}\) Then
\begin{equation*} T\vvec = Tp(T)\uvec = p(T)(T\uvec)\text{,} \end{equation*}
which shows that \(T\vvec\) is also in \(\range(p(T))\text{.}\) Therefore, \(\range(p(T))\) is an invariant subspace of \(T\text{.}\)
We now come to a crucial result for our upcoming explorations. First, we need to make the following definition clear.

Definition 1.4.9.

A monic polynomial is a polynomial where the coefficient of the highest degree term is \(1\text{.}\)
For instance, \(x^5 - x^4 + 3x - 2\) is a monic polynomial, but \(3x^3+2x-1\) is not.

Proof.

Our proof proceeds by induction on the dimension of \(V\text{.}\) To begin, suppose that \(\dim(V) = 1\text{,}\) which means that \(V=\laspan{\vvec}\) for some vector \(\vvec\text{.}\) Then \(T\vvec = \lambda\vvec\) for some \(\lambda\text{,}\) which is possibly \(0\text{.}\) Then \((T-\lambda I)\vvec = \zerovec\text{,}\) which means that \(T-\lambda I = 0\) since \(\vvec\) spans \(V\text{.}\) Therefore, if \(p(x) = x-\lambda\text{,}\) we have \(p(T)=0\text{.}\)
We now imagine that \(\dim(V)=n\) and that the theorem has been verified for all operators on vector spaces of dimension less than \(n\text{.}\) We choose a vector \(\vvec\) and consider the powers \(T^k\vvec\text{;}\) that is, consider the vectors
\begin{equation*} \vvec,T\vvec,T^2\vvec,\ldots,T^n\vvec\text{.} \end{equation*}
Since there are \(n+1\) vector in this set and \(\dim(V)=n\text{,}\) we know this is a linearly dependent set.
Choose \(m\) to be the smallest integer such that \(T^m\vvec\) is a linear combination of \(\vvec,T\vvec,\ldots,T^{m-1}\vvec\text{.}\) This means two things. First, the vectors \(\vvec,T\vvec,\ldots,T^{m-1}\vvec\) are linearly independent. Second, there are constants
\begin{equation*} a_0\vvec + a_1T\vvec + \ldots + a_{m-1}T^{m-1}\vvec + T^m\vvec = \zerovec\text{.} \end{equation*}
If we define the degree \(m\) monic polynomial
\begin{equation*} p(x)=a_0 + a_1x + \ldots + a_{m-1}x^{m-1}+x^m\text{,} \end{equation*}
then \(p(T)\vvec = \zerovec\text{.}\) That is, \(\vvec\) is in \(\nul(p(T))\text{.}\)
Since \(\nul(p(T))\) is invariant under \(T\) and \(\vvec\) is in \(\nul(p(T))\text{,}\) we know that
\begin{equation*} \vvec,T\vvec,\ldots,T^{m-1}\vvec \end{equation*}
are all in \(\nul(p(T))\text{.}\) These vectors are linearly independent so we know that \(\dim(\nul(p(T)))\geq m\text{.}\) Therefore,
\begin{equation*} \dim(\range(p(T))) = \dim(V) - \dim(\nul(p(T))) \leq n - m\text{.} \end{equation*}
For convenience, we will denote the vector space \(W=\range(p(T))\text{.}\) Since \(W\) is invariant under \(T\text{,}\) \(T\) is an operator on \(W\text{,}\) whose dimension is less than \(\dim(V)\text{.}\) By the induction hypothesis, we know that there is a unique monic polynomial \(q(x)\) such that \(q(T|_W)=0\text{.}\) Again by the induction hypothesis, it also follows that \(\deg(q) \leq \dim(W) \leq n - m\text{.}\)
Now consider the polynomial \(pq\) whose degree is
\begin{equation*} \deg(qp) = \deg(q)+\deg(p) \leq n - m + m \leq n = \dim(V)\text{.} \end{equation*}
Moreover, both \(p\) and \(q\) are monic so \(pq\) is also monic. Finally, for any vector \(\vvec\) in \(V\text{,}\) we have
\begin{equation*} (qp)(T)\vvec = q(T)p(T)\vvec = q(T)(p(T)\vvec) = \zerovec \end{equation*}
where the last equality holds because \(p(T)\vvec\) is in \(W=\range(p(T))\) and \(q(T)\uvec=\zerovec\) for any vector \(\uvec\) in \(W\text{.}\) Since \((qp)(T)\vvec=\zerovec\) for every vector \(\vvec\text{,}\) this means that \((qp)(T)=0\text{.}\)
This shows that there is a monic polynomial \(s\) such that \(s(T)=0\) on \(V\text{.}\) Therefore, there is some possibly different polynomial having the smallest degree among all such polynomials, and this is the minimal polynomial of the operator \(T\) on \(V\text{.}\)
To see that this polynomial is unique, suppose there are two monic polynomials \(s_1\) and \(s_2\) having smallest degree and \(s_1(T)=0\) and \(s_2(T)=0\text{.}\) If we consider \(s_1-s_2\text{,}\) we see that \(\deg(s_1-s_2)\lt \deg(s_1)=\deg(s_2)\) since the highest degree terms of \(s_1\) and \(s_2\) have coefficients \(1\) and therefore cancel. Also,
\begin{equation*} (s_1-s_2)(T) = s_1(T) - s_2(T) = 0\text{.} \end{equation*}
However, this is impossible since \(s_1\) and \(s_2\) had the smallest possible degree among all polynomials that vanish when evaluated on \(T\text{.}\) This means that \(s_1=s_2\text{,}\) which guarantees the uniqueness of the minimal polynomial.

Example 1.4.11.

Consider the \(2\times2\) matrix \(A=\begin{bmatrix} 2 \amp 0 \\ 0 \amp 2 \end{bmatrix}\) and the linear operator \(T\) that it defines. Notice that \(A-2I = 0\) so if \(p(x)=x-2\text{,}\) then \(p(T)=0\text{.}\) The minimal polynomial of \(T\) is therefore \(p(x)=x-2\text{.}\)
More generally, suppose that the minimal polynomial of an operator \(T\) has degree \(1\text{.}\) Since the minimal polynomial is monic, this means that \(p(x)=x-\lambda\text{.}\) Because \(p(T)=T-\lambda I = 0\text{,}\) we see that \(T=\lambda I\text{,}\) a multiple of the identity.

Example 1.4.12.

By contrast, consider the \(2\times2\) matrix \(B=\begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix}\) and the linear operator \(S\) that it defines with respect to some basis. The degree of the minimal polynomial must be at least 2 since \(B\) is not a multiple of the identity matrix. Notice, however, that \(B-2I = \begin{bmatrix} 0 & 1 \\ 0 & 0 \\ \end{bmatrix}\) and \((B-2I)^2 = 0\text{.}\) This says that \((S-2I)^2 = 0\) and so the minimal polynomial of \(S\) is \(q(x)=(x-2)^2\text{.}\)
Both of the matrices in the two previous examples are upper triangular. Remembering that the eigenvalues of an upper triangular matrix are the entries on the diagonal, we see that the roots of the minimal polynomials in both cases are the eigenvalues of the operator. This gives some indication of the importance of the minimal polynomial, as we will see now.
The fact that the minimal polynomial has the smallest degree among all polynomials for which \(p(T)=0\) has some important consequences.

Proof.

Suppose that \(p\) is the minimal polynomial of \(T\text{.}\) We need to explain two things: that any eigenvalue of \(T\) is a root of \(p\) and that any root of \(p\) is an eigenvalue of \(T\text{.}\)
Suppose that \(\lambda\) is an eigenvalue of \(T\text{.}\) This means that there is a nonzero vector \(\vvec\) such that \(T\vvec = \lambda\vvec\) and therefore \(T^j\vvec = \lambda^j\vvec\) for every \(j\text{.}\) This means that
\begin{equation*} 0 = p(T)\vvec = p(\lambda)\vvec\text{,} \end{equation*}
which implies that \(p(\lambda) = 0\text{.}\) Therefore, \(\lambda\) is a root of \(p\text{,}\) the minimal polynomial of \(T\text{.}\)
Conversely, suppose that \(\lambda\) is a root of \(p\text{.}\) By Proposition 1.4.4, this means that
\begin{equation*} p(x) = (x-\lambda)q(x)\text{.} \end{equation*}
This says that
\begin{equation*} 0 = p(T) = (T-\lambda I)q(T) \end{equation*}
However, \(q(T)\neq 0\) since \(\deg(q) \lt \deg(p)\text{,}\) there is some vector \(\vvec\) for which \(q(T)\vvec\neq 0\text{.}\) Therefore,
\begin{equation*} 0 = p(T)\vvec = (T-\lambda I)q(T)\vvec\text{,} \end{equation*}
which shows that \(q(T)\vvec\) is an eigenvector \(T\) with associated eigenvalue \(\lambda\text{.}\)
This is the most significant fact about the minimal polynomial: that its roots are the eigenvalues of the operator. We’ll put this to use in the next section. Before that, however, here are two other useful facts.

Proof.

If \(p\) is the minimal polynomial of \(T\text{,}\) then we can apply the Division Algorithm to write \(s = pq + r\) where \(\deg(r) \lt \deg(p)\text{.}\) Furthermore,
\begin{equation*} 0 = s(T) = p(T)q(T) + r(T) = r(T)\text{,} \end{equation*}
which implies that \(r(T)=0\text{.}\) Since \(p\) has the smallest degree among all polynomials that vanish when evaluated on \(T\) and \(r\) has a smaller degree than \(p\text{,}\) we know that \(r=0\text{.}\) Therefore, \(s=pq\text{.}\)

Proof.

Suppose that \(s\) is the minimal polynomial of \(T|_W\) and \(p\) is the minimal polynomial of \(T\) on \(V\text{.}\) This means that \(p(T)\wvec=0\) for every vector \(\wvec\) in \(W\) and so \(p(T|_W) = 0\text{.}\) We also know that \(s\) is the minimal polynomial of \(T|_W\) so by Proposition 1.4.14, we know that \(p\) is a multiple of \(s\text{.}\)