Section 2.3 Nilpotent operators
The previous section gave some conditions for a matrix to be diagonalizable. We will now explore what happens when a matrix is not diagonalizable.
If an operator \(T\) on a vector space \(V\) is diagonalizable, then the basis \(\bcal\) for which \(\coords{T}{\bcal}\) is diagonal consists of eigenvectors of \(T\text{.}\) Remember that eigenvectors of an operator \(T\) are found through the equation \((T-\lambda I)\vvec=\zerovec\text{.}\)
If we want to explore what happens when \(T\) is not diagonalizable, we will need a more general notion of eigenvectors. To that end, we refer to vectors satisfying
\begin{equation*}
(T-\lambda I)^k\vvec=\zerovec
\end{equation*}
for some power \(k\) as generalized eigenvectors. To get started, however, we will first consider a related class of operators.
Subsection 2.3.1 Null spaces of powers
Suppose that \(T\) is an operator on \(V\text{.}\) If \(\vvec\) is a vector for which \(T^k\vvec=\zerovec\text{,}\) then it also happens that
\begin{equation*}
T^{k+1}\vvec=T(T^k\vvec)=\zerovec\text{.}
\end{equation*}
This means that \(\nul(T^k)\subset\nul(T^{k+1})\text{,}\) and we therefore have
\begin{equation*}
\{0\}\subset\nul(T)\subset\nul(T^2)\subset\nul(T^3)\subset\ldots\text{.}
\end{equation*}
The next propositions say that this process stabilizes so that the inclusions eventually become equalities. First we show that when we reach an equality, then all the following inclusions are equalities as well.
Proof.
Suppose that \(\nul(T^n) = \nul(T^{n+1})\) for some \(n\) and that \(\vvec\) is a vector in \(\nul(T^{n+2})\text{.}\) It follows that
\begin{equation*}
T^{n+2}\vvec = T^{n+1}(T\vvec) = 0\text{,}
\end{equation*}
which means that \(T\vvec\) is in \(\nul(T^{n+1})\text{.}\) Because \(\nul(T^{n+1}) = \nul(T^n)\text{,}\) it follows that \(T\vvec\) is in \(\nul(T^n)\text{,}\) which says that
\begin{equation*}
T^{n}(T\vvec) = T^{n+1}\vvec = 0\text{.}
\end{equation*}
We then see that
\begin{equation*}
\nul(T^m)=\nul(T^{m+1})=\nul(T^{m+2}) = \nul(T^{m+3}) =
\ldots\text{.}
\end{equation*}
The next result says that this process will always stablize by the time we reach the dimension of \(V\text{.}\)
Proposition 2.3.2.
For any operator \(T\) on a vector space \(V\) of dimension \(n\text{,}\)
\begin{equation*}
\nul(T^n)=\nul(T^{n+1})\text{.}
\end{equation*}
Proof.
If \(\nul(T)=\{\zerovec\}\text{,}\) then \(T\) is invertible as is every power of \(T\text{.}\) Therefore \(\nul(T^k)=\{\zerovec\}\) for every power, including \(n\) and \(n+1\text{.}\)
Now suppose that \(\nul(T)\) has positive dimension so that \(\dim\nul(T) \geq 1\) and that the null spaces continue to grow
\begin{equation*}
\nul(T)\subsetneq\nul(T^2)\ldots\subsetneq\nul(T^m)\text{.}
\end{equation*}
In this case,
\begin{equation*}
n=\dim(V)\geq\dim\nul(T^m)\geq m\text{.}
\end{equation*}
This shows that the null spaces cannot grow beyond \(T^n\) so we have \(\nul(T^n)=\nul(T^{n+1})\text{.}\)
Subsection 2.3.2 Nilpotent operators
We will now focus on a particular type of operator known as nilpotent.
Definition 2.3.3. Nilpotent operator.
Notice that the operator \(T=0\) is nilpotent, but there are certainly nonzero nilpotent operators.
Example 2.3.4.
Consider the matrix \(A = \begin{bmatrix}
0 \amp 1 \\
0 \amp 0 \\
\end{bmatrix}\) and notice that \(A^2=0\text{.}\) An operator \(T\) whose associated matrix is \(A\) with respect to some basis is nilpotent since \(T^2=0\text{.}\)
Suppose \(T\) is nilpotent and that \(m\) is the smallest power for which \(T^m=0\text{.}\) This means that \(p(T)=0\) if \(p(x)=x^m\) and hence \(p\) is the minimal polynomial of \(T\text{.}\) This says that \(m\leq \dim
V\) so the smallest power of \(T\) that is zero is not more than the dimension of \(V\text{.}\)
We could view
\begin{equation*}
p(x) = x^m = (x-0)^m\text{,}
\end{equation*}
which says that there is a basis for which the matrix associated to \(T\) is upper triangular
\begin{equation*}
\begin{bmatrix}
0 \amp * \amp * \amp \ldots \amp * \\
0 \amp 0 \amp * \amp \ldots \amp * \\
\vdots \amp \vdots \amp \vdots \amp \ddots \amp * \\
0 \amp 0 \amp 0 \amp 0 \amp 0 \\
\end{bmatrix}\text{.}
\end{equation*}
In fact, we will see that there is a basis so that the matrix associated to a nilpotent operator has an especially nice form.
Definition 2.3.5.
A nilpotent block matrix is a square matrix having the form
\begin{equation*}
\begin{bmatrix}
0 \amp 1 \amp 0 \amp \ldots \amp 0 \\
0 \amp 0 \amp 1 \amp \ldots \amp 0 \\
\vdots \amp \vdots \amp \vdots \amp \ddots \amp 1 \\
0 \amp 0 \amp 0 \amp 0 \amp 0 \\
\end{bmatrix}\text{.}
\end{equation*}
That is, all the entries are zero except for the entries directly above the diagonal, which are 1.
Example 2.3.6.
The following matrix consists of three nilpotent blocks on the diagonal, a \(3\times3\) block, a \(2\times2\) block, and a \(1\times1\) block.
\begin{equation*}
\begin{bmatrix}
0 \amp 1 \amp 0 \amp \gz \amp \gz \amp \gz \\
0 \amp 0 \amp 1 \amp \gz \amp \gz \amp \gz \\
0 \amp 0 \amp 0 \amp \gz \amp \gz \amp \gz \\
\gz \amp \gz \amp \gz \amp 0 \amp 1 \amp \gz \\
\gz \amp \gz \amp \gz \amp 0 \amp 0 \amp \gz \\
\gz \amp \gz \amp \gz \amp \gz \amp \gz \amp 0 \\
\end{bmatrix}\text{.}
\end{equation*}
Letβs look a little more closely at how the linear transformation acts on basis vectors, which weβll denote as \(\basis{\vvec}{6}\text{.}\) We have
\begin{align*}
T\vvec_1 \amp = 0 \\
T\vvec_2 \amp = \vvec_1 \\
T\vvec_3 \amp = \vvec_2 \\
T\vvec_4 \amp = 0 \\
T\vvec_5 \amp = \vvec_4 \\
T\vvec_6 \amp = 0 \text{.}
\end{align*}
This shows that \(T^3=0\text{,}\) but \(T^2\neq
0\) so we have null spaces
\begin{equation*}
\{0\}\subset\nul(T)\subset\nul(T^2)\subset\nul(T^3) = V\text{.}
\end{equation*}
Within these null spaces, we have bases
\begin{align*}
\nul(T) \amp = \laspan{\vvec_1,\vvec_4, \vvec_6}\\
\nul(T^2) \amp =\nul(T)\oplus \laspan{\vvec_2,\vvec_5}\\
\nul(T^3) \amp =\nul(T^2)\oplus \laspan{\vvec_3}\text{.}
\end{align*}
Notice that the \(3\times3\) block is formed by a vector \(\vvec_3\) that is in \(\nul(T^3)\) but not \(\nul(T^2)\text{.}\) Once we have identified \(\vvec_3\text{,}\) we obtain new basis vectors as \(\vvec_2=T\vvec_3\) and \(\vvec_1=T\vvec_2=T^2\vvec_3\text{.}\)
In fact, every nilpotent operator has a basis whose associated matrix consists of a set of nilpotent blocks on the diagonal, which we will state and prove in the next proposition.
First, notice that if \(T\) is nilpotent, then its minimal polynomial is \(p(x)=x^m\) for some \(m\text{.}\) In particular, this means that \(T^m=0\) but \(T^{m-1}\neq 0\text{.}\) As we saw in SubsectionΒ 2.3.1, we have the inclusion of null spaces:
\begin{equation}
\{0\}\subsetneq\nul(T)\subsetneq\nul(T^2) \subsetneq \ldots
\subsetneq \nul(T^m) = V\tag{2.3.1}
\end{equation}
where the inclusion of each null space into the other is a proper inclusion.
If \(\vvec\) is in \(\nul(T^{j+1})\text{,}\) then
\begin{equation*}
0 = T^{j+1}\vvec = T^j(T\vvec)\text{,}
\end{equation*}
which means that \(T\vvec\) is in \(\nul(T^j)\text{.}\) In other words, applying \(T\) pushes a vector to the left in the inclusions of null spaces in (2.3.1).
We are now ready to prove our structure theorem for nilpotent operators.
Proposition 2.3.7.
If \(T\) is a nilpotent operator on \(V\text{,}\) then there is a basis for \(V\) such that the matrix associated to \(T\) has the form
\begin{equation*}
\begin{bmatrix}
A_1 \amp 0 \amp \ldots \amp 0 \\
0 \amp A_2 \amp \ldots \amp 0 \\
0 \amp \vdots \amp \ddots \amp 0 \\
0 \amp 0 \amp 0 \amp A_k \\
\end{bmatrix}
\end{equation*}
where each \(A_j\) is a nilpotent block.
Proof.
Our proof proceeds by induction on the dimension of the vector space \(V\text{,}\) which we will denote by \(\dim V =
n\text{.}\)
To verify the base case, suppose that \(\dim V = n =
1\text{.}\) As we have seen, if \(\vvec\) is a vector in \(V\text{,}\) then \(T\vvec = \lambda\vvec\) for some scalar \(\lambda\text{.}\) However, if \(T\) is nilpotent, then \(\lambda = 0\) and so \(T=0\text{.}\) In any basis, the matrix representing \(T\) is \([0]\text{,}\) a \(1\times1\) nilpotent block.
Now suppose that the result is true for any nilpotent operator on a vector space of dimension less than \(n\text{.}\) Suppose also that the minimal polynomial of \(T\) is \(p(x)=x^m\text{.}\) This means that \(T^m=0\) but \(T^{m-1}\neq 0\) so
\begin{equation*}
\nul(T^{m-1})\subsetneq\nul(T^m)\text{.}
\end{equation*}
We will choose a vector \(\vvec_m\) in \(\nul(T^m)\) that is not in \(\nul(T^{m-1})\) and define
\begin{align*}
\vvec_{m-1}\amp=T\vvec_m\\
\vvec_{m-2}\amp=T\vvec_{m-1}=T^2\vvec_m\\
\vdots\amp=\vdots\\
\vvec_{2}\amp=T\vvec_3 = T^{m-2}\vvec_m\\
\vvec_{1}\amp=T\vvec_2=T^{m-1}\vvec_m\text{.}
\end{align*}
Notice that \(T\vvec_1 = T^m\vvec_m=0\) so that \(\vvec_1\) is in \(\nul(T)\text{.}\) More generally, \(\vvec_j\) is in \(\nul(T^j)\text{.}\)
We will use \(U\) to denote the subspace spanned by \(\basis{\vvec}{m}\text{.}\) Notice that a vector \(\uvec\) in \(U\) may be written as
\begin{equation*}
\uvec=c_1\vvec_1 + c_2\vvec_2 + \ldots c_m\vvec_m
\end{equation*}
and therefore
\begin{equation*}
T\uvec=c_2\vvec_1 + c_3\vvec_2 + \ldots c_m\vvec_{m-1}\text{.}
\end{equation*}
This shows that \(U\) is a \(T\)-invariant subspace of \(V\text{.}\)
Using CorollaryΒ 1.2.42, suppose now that \(\phi:V\to\field\) is a linear functional so that
\begin{equation*}
\phi(\vvec_1) = 1, \phi(\vvec_2) = 0,\ldots,\phi(\vvec_m) =
0\text{.}
\end{equation*}
We then define \(S:V\to \field^m\) by
\begin{equation*}
S(\vvec) = \left[\begin{array}{c}\phi(\vvec)\\ \phi(T\vvec) \\
\vdots \\ \phi(T^{m-1}\vvec) \\
\end{array}\right]\text{.}
\end{equation*}
Once again, if \(\uvec\) is a vector in \(U\text{,}\) then \(\uvec=c_1\vvec_1 + c_2\vvec_2 + \ldots c_m\vvec_m\) so that \(S(\uvec) = \cfourvec{c_1}{c_2}{\vdots}{c_m}\text{.}\) This shows three things.
-
\(S\) is surjective.
-
\(S(\vvec_j)=\evec_j\text{,}\) the standard basis vector in \(\field^m\text{,}\) which means that \(\basis{\vvec}{m}\) is a linearly independent set and therefore a basis for \(U\text{.}\) Moreover, in this basis, the matrix representing \(T|_U\) is a nilpotent \(m\times m\) block.
-
\(S(\uvec) = 0\) implies that \(\uvec=0\text{,}\) which means that \(U\cap\nul(S) = \{0\}\text{.}\)
Now consider \(W=\nul(S)\text{.}\) Since \(S\) is surjective, we have \(\dim W = n-m \lt n\) by the Fundamental Theorem of Linear MapsΒ 1.2.13. Moreover, \(U\cap W = U\cap
\nul(S) = \{0\}\) so we have \(U\oplus W = V\text{.}\)
Finally, we claim that \(W\) is a \(T\)-invariant subspace. Notice that \(\wvec\) is in \(W=\nul(S)\) if and only if \(\phi(T^j\vvec) = 0\) for all \(j\text{.}\) If this is the case, then \(\phi(T^j(T\vvec)) =
\phi(T^{j+1}\vvec) = 0\) for all \(j\text{,}\) which shows that \(W\) is \(T\)-invariant.
Because \(\dim W\lt n\) and \(T|_W\) is nilpotent, the inductive hypothesis applies to show that there is a basis for \(W\) so that the matrix representing \(T|_W\) consists of nilpotent blocks. We can combine this basis with \(\basis{\vvec}{m}\) to finish the proof of the theorem.