Section 2.3 Nilpotent operators
Eigenvectors of an operator \(T\) are found through the equation \((T-\lambda I)\vvec=\zerovec\text{.}\) Now that we have developed some conditions under which operators are diagonalizable, we would like to understand what happens when operators are not diagonalizable. To this end, we will generalize the eigenvector condition by considering
\begin{equation*}
(T-\lambda I)^k\vvec=\zerovec\text{.}
\end{equation*}
To get started, however, we will first consider a related class of operators.
Subsection 2.3.1 Null spaces of powers
Suppose that \(T\) is an operator. If \(T^k\vvec=\zerovec\text{,}\) then it also happens that
\begin{equation*}
T^{k+1}\vvec=T(T^k\vvec)=\zerovec\text{.}
\end{equation*}
This means that \(\nul(T^k)\subset\nul(T^{k+1})\text{,}\) and we therefore have
\begin{equation*}
\nul(T)\subset\nul(T^2)\subset\nul(T^3)\subset\ldots\text{.}
\end{equation*}
The next propositions say that this process stabilizes so that the inclusions eventually become equalities. First we show that when we reach an equality, then all the following inclusions are equalities as well.
Proof.
Suppose that \(\nul(T^n) = \nul(T^{n+1})\) for some \(n\) and that \(\vvec\) is a vector in \(\nul(T^{n+2})\text{.}\) It follows that
\begin{equation*}
T^{n+2}\vvec = T^{n+1}(T\vvec) = 0\text{,}
\end{equation*}
which means that \(T\vvec\) is in \(\nul(T^{n+1})\text{.}\) Because \(\nul(T^{n+1}) = \nul(T^n)\text{,}\) it follows that \(T\vvec\) is in \(\nul(T^n)\text{,}\) which says that
\begin{equation*}
T^{n}(T\vvec) = T^{n+1}\vvec = 0\text{.}
\end{equation*}
We then see that
\begin{equation*}
\nul(T^m)=\nul(T^{m+1})=\nul(T^{m+2}) = \nul(T^{m+3}) =
\ldots\text{.}
\end{equation*}
The next result says that this process will always stablize by the time we reach the dimension of \(V\text{.}\)
Proposition 2.3.2.
For any operator \(T\) on a vector space \(V\) of dimension \(n\text{,}\)
\begin{equation*}
\nul(T^n)=\nul(T^{n+1})\text{.}
\end{equation*}
Proof.
If \(\nul(T)=\{\zerovec\}\text{,}\) then \(T\) is invertible as is every power of \(T\text{.}\) Therefore \(\nul(T^k)=\{\zerovec\}\) for every power, including \(n\) and \(n+1\text{.}\)
Now suppose that \(\nul(T)\) has positive dimension so that \(\dim\nul(T) \geq 1\) and that the null spaces continue to grow
\begin{equation*}
\nul(T)\subsetneq\nul(T^2)\ldots\subsetneq\nul(T^m)\text{.}
\end{equation*}
In this case,
\begin{equation*}
n=\dim(V)\geq\dim\nul(T^m)\geq m\text{.}
\end{equation*}
This shows that the null spaces cannot grow beyond \(T^n\) so we have \(\nul(T^n)=\nul(T^{n+1})\text{.}\)
Subsection 2.3.2 Nilpotent operators
We will now focus on a particular type of operator known as nilpotent.
Definition 2.3.3. Nilpotent operator.
An operator \(T\) is nilpotent if some power \(T^m=0\text{.}\)
Notice that the operator \(T=0\) is nilpotent but we will often consider nonzero nilpotent operators.
Example 2.3.4.
Consider the matrix \(A = \begin{bmatrix}
0 \amp 1 \\
0 \amp 0 \\
\end{bmatrix}\) and notice that \(A^2=0\text{.}\) An operator \(T\) whose associated matrix is \(A\) with respect to some basis is nilpotent since \(T^2=0\text{.}\)
Suppose \(T\) is nilpotent and that \(m\) is the smallest power for which \(T^m=0\text{.}\) This means that \(p(T)=0\) if \(p(x)=x^m\) and hence \(p\) is the minimal polynomial of \(T\text{.}\) We could view
\begin{equation*}
p(x) = x^m = (x-0)^m\text{,}
\end{equation*}
which says that there is a basis for which the matrix associated to \(T\) is upper triangular
\begin{equation*}
\begin{bmatrix}
0 \amp * \amp * \amp \ldots \amp * \\
0 \amp 0 \amp * \amp \ldots \amp * \\
\vdots \amp \vdots \amp \vdots \amp \ddots \amp * \\
0 \amp 0 \amp 0 \amp 0 \amp 0 \\
\end{bmatrix}\text{.}
\end{equation*}
In fact, we will see that there is a basis so that the matrix associated to a nilpotent operator has an especially nice form.
Definition 2.3.5.
A nilpotent block matrix is a square matrix having the form
\begin{equation*}
\begin{bmatrix}
0 \amp 1 \amp 0 \amp \ldots \amp 0 \\
0 \amp 0 \amp 1 \amp \ldots \amp 0 \\
\vdots \amp \vdots \amp \vdots \amp \ddots \amp 1 \\
0 \amp 0 \amp 0 \amp 0 \amp 0 \\
\end{bmatrix}\text{.}
\end{equation*}
This is, all the entries are zero except for the entries directly above the diagonal, which are 1.
Example 2.3.6.
The following matrix consists of three nilpotent blocks on the diagonal, a \(3\times3\) block, a \(2\times2\) block, and a \(1\times1\) block.
\begin{equation*}
\begin{bmatrix}
0 \amp 1 \amp 0 \amp 0 \amp 0 \amp 0 \\
0 \amp 0 \amp 1 \amp 0 \amp 0 \amp 0 \\
0 \amp 0 \amp 0 \amp 0 \amp 0 \amp 0 \\
0 \amp 0 \amp 0 \amp 0 \amp 1 \amp 0 \\
0 \amp 0 \amp 0 \amp 0 \amp 0 \amp 0 \\
0 \amp 0 \amp 0 \amp 0 \amp 0 \amp 0 \\
\end{bmatrix}\text{.}
\end{equation*}
Let’s look a little more closely at how the linear transformation acts on basis vectors, which we’ll denote as \(\basis{\vvec}{6}\text{.}\) We have
\begin{align*}
T\vvec_1 \amp = 0 \\
T\vvec_2 \amp = \vvec_1 \\
T\vvec_3 \amp = \vvec_2 \\
T\vvec_4 \amp = 0 \\
T\vvec_5 \amp = \vvec_4 \\
T\vvec_6 \amp = 0 \text{.}
\end{align*}
This transformation satisfies \(T^3=0\text{,}\) but \(T^2\neq
0\) so we have null spaces
\begin{equation*}
\{0\}\subset\nul(T)\subset\nul(T^2)\subset(T^3) = V\text{.}
\end{equation*}
Within these null spaces, we have bases
\begin{align*}
\nul(T) \amp = \laspan{\vvec_1,\vvec_4, \vvec_6}\\
\nul(T^2) \amp =\nul(T)\oplus \laspan{\vvec_2,\vvec_5}\\
\nul(T^3) \amp =\nul(T^2)\oplus \laspan{\vvec_3}\text{.}
\end{align*}
For each \(j\text{,}\) we have
\begin{equation*}
\nul(T^{j+1}) = \nul(T^j) \oplus W_j\text{.}
\end{equation*}
Notice that the \(3\times3\) block is formed by a vector \(\vvec_3\) that is in \(\nul(T^3)\) but not \(\nul(T^2)\text{.}\) Once we have identified \(\vvec_3\text{,}\) we obtain new basis vectors as \(\vvec_2=T\vvec_3\) and \(\vvec_1=T\vvec_2=T^2\vvec_3\text{.}\)
In fact, every nilpotent operator has a basis whose associated matrix consists of a set of nilpotent blocks on the diagonal, which we will state and prove in the next proposition.
First, notice that if \(T\) is nilpotent, then its minimal polynomial is \(p(x)=x^m\) for some \(m\text{.}\) In particular, this means that \(T^m=0\) but \(T^{m-1}\neq 0\text{.}\) As we saw in Subsection 2.3.1, we have the inclusion of null spaces:
\begin{equation}
\{0\}\subsetneq\nul(T)\subsetneq\nul(T^2) \subsetneq \ldots
\subsetneq \nul(T^m) = V\tag{2.3.1}
\end{equation}
where the inclusion of each null space into the other is a proper inclusion.
If \(\vvec\) is in \(\nul(T^{j+1})\text{,}\) then
\begin{equation*}
0 = T^{j+1}\vvec = T^j(T\vvec)\text{,}
\end{equation*}
which means that \(T\vvec\) is in \(\nul(T^j)\text{.}\) In other words, applying \(T\) pushes a vector to the left in the inclusions of null spaces in (2.3.1).
We are now ready to prove our structure theorem for nilpotent operators.
Proposition 2.3.7.
If \(T\) is a nilpotent operator on \(V\text{,}\) then there is a basis for \(V\) such that the matrix associated to \(T\) has the form
\begin{equation*}
\begin{bmatrix}
A_1 \amp 0 \amp \ldots \amp 0 \\
0 \amp A_2 \amp \ldots \amp 0 \\
0 \amp \vdots \amp \ddots \amp 0 \\
0 \amp 0 \amp 0 \amp A_k \\
\end{bmatrix}
\end{equation*}
where each \(A_j\) is a nilpotent block.
Proof.
Our proof proceeds by induction on the dimension of the vector space \(V\text{,}\) which we will denote by \(\dim V =
n\text{.}\)
To verify the base case, suppose that \(\dim V = n =
1\text{.}\) As we have seen, if \(\vvec\) is a vector in \(V\text{,}\) then \(T\vvec = \lambda\vvec\) for some scalar \(\lambda\text{.}\) However, if \(T\) is nilpotent, then \(\lambda = 0\) and so \(T=0\text{.}\) In any basis, the matrix representing \(T\) is \([0]\text{,}\) a \(1\times1\) nilpotent block.
Now suppose that the result is true for any nilpotent operator on a vector space of dimension less than \(n\text{.}\) Suppose also that the minimal polynomial of \(T\) is \(p(x)=x^m\text{.}\) This means that \(T^m=0\) but \(T^{m-1}\neq 0\) so
\begin{equation*}
\nul(T^{m-1})\subsetneq\nul(T^m)\text{.}
\end{equation*}
We will choose a vector \(\vvec_m\) in \(\nul(T^m)\) that is not in \(\nul(T^{m-1})\) and define
\begin{align*}
\vvec_{m-1}\amp=T\vvec_m\\
\vvec_{m-2}\amp=T\vvec_{m-1}=T^2\vvec_m\\
\vdots\amp=\vdots\\
\vvec_{2}\amp=T\vvec_3 = T^{m-2}\vvec_m\\
\vvec_{1}\amp=T\vvec_2=T^{m-1}\vvec_m\text{.}
\end{align*}
Notice that \(T\vvec_1 = T^m\vvec_m=0\) so that \(\vvec_1\) is in \(\nul(T)\text{.}\) More generally, \(\vvec_j\) is in \(\nul(T^j)\text{.}\)
We will use \(U\) to denote the subspace spanned by \(\basis{\vvec}{m}\text{.}\) Notice that a vector \(\uvec\) in \(U\) may be written as
\begin{equation*}
\uvec=c_1\vvec_1 + c_2\vvec_2 + \ldots c_m\vvec_m
\end{equation*}
and therefore
\begin{equation*}
T\uvec=c_2\vvec_1 + c_3\vvec_2 + \ldots c_m\vvec_{m-1}\text{.}
\end{equation*}
This shows that \(U\) is a \(T\)-invariant subspace of \(V\text{.}\)
Suppose now that \(\phi:V\to\field\) is a linear functional so that
\begin{equation*}
\phi(\vvec_1) = 1, \phi(\vvec_2) = 0,\ldots,\phi(\vvec_m) =
0\text{.}
\end{equation*}
We then define \(S:V\to \field^m\) by
\begin{equation*}
S(\vvec) = \left[\begin{array}{c}\phi(\vvec)\\ \phi(T\vvec) \\
\vdots \\ \phi(T^{m-1}\vvec) \\
\end{array}\right]\text{.}
\end{equation*}
Once again, if \(\uvec\) is a vector in \(U\text{,}\) then \(\uvec=c_1\vvec_1 + c_2\vvec_2 + \ldots c_m\vvec_m\) so that \(S(\uvec) = \fourvec{c_1}{c_2}{\vdots}{c_m}\text{.}\) This shows three things.
- \(S\) is surjective.
- \(S(\vvec_j)=\evec_j\text{,}\) the standard basis vector in \(\field^m\text{,}\) which means that \(\basis{\vvec}{m}\) is a linearly independent set and therefore a basis for \(U\text{.}\) Moreover, in this basis, the matrix representing \(T|_U\) is a nilpotent \(m\times m\) block.
- \(S(\uvec) = 0\) implies that \(\uvec=0\text{,}\) which means that \(U\cap\nul(S) = \{0\}\text{.}\)
Now consider \(W=\nul(S)\text{.}\) Since \(S\) is surjective, we have \(\dim W = n-m \lt n\) by the Fundamental Theorem of Linear Maps 1.2.13. Moreover, \(U\cap W = U\cap
\nul(S) = \{0\}\) so we have \(U\oplus W = V\text{.}\)
Finally, we claim that \(W\) is a \(T\)-invariant subspace. Notice that \(\wvec\) is in \(W=\nul(S)\) if and only if \(\phi(T^j\vvec) = 0\) for all \(j\text{.}\) If this is the case, then \(\phi(T^j(T\vvec)) =
\phi(T^{j+1}\vvec) = 0\) for all \(j\text{,}\) which shows that \(W\) is \(T\)-invariant.
Because \(\dim W\lt n\) and \(T|_W\) is nilpotent, the inductive hypothesis applies to show that there is a basis for \(W\) so that the matrix representing \(T|_W\) consists of nilpotent blocks. We can combine this basis with \(\basis{\vvec}{m}\) to finish the proof of the theorem.