Inner products

Section 1.3 Inner products

In our earlier studies, we introduced the dot product to introduce a richer geometric perspective on some key ideas. In particular, we could use the dot product to detect when vectors are orthogonal, and this led to many simplifications. For instance, the inverse of a matrix whose columns form an orthonormal basis of \(\real^n\) is just the transpose of that matrix.

As we expand our study to consider more general vector spaces, we need to introduce a concept like the dot product for vector spaces. This leads us to the concept of an inner product.

Subsection 1.3.1 Inner products

On the vector space \(\real^n\text{,}\) we introduced the dot product between two vectors:

\begin{equation*} \vvec\cdot\wvec = v_1w_1+v_2w_2 + \ldots+v_nw_n\text{.} \end{equation*}

One important property is that

\begin{equation*} \vvec\cdot\vvec = v_1^2 + v_2^2 + \ldots + v_n^2 = \len{\vvec}^2\text{.} \end{equation*}

More generally, we had the following properties:

Positivity.
\(\vvec\cdot\vvec \geq 0\) with \(\vvec\cdot\vvec=0\) if and only if \(\vvec=\zerovec\text{.}\)
Symmetry.
\(\vvec\cdot\wvec = \wvec\cdot\vvec\text{.}\)
Linearity.
\((c_1\vvec_1+c_2\vvec_2)\cdot\wvec = c_1\vvec_1\cdot\wvec + c_2\vvec_2\cdot\wvec\text{.}\)

Things are a little different when we are using complex numbers. If \(z\) is a general complex number, then \(z^2\) is not guaranteed to be real, much less nonnegative. To preserve the positivity condition above, remember that the complex conjugate is defined by

\begin{equation*} \conj{a+bi} = a-bi \end{equation*}

so that if \(z=a+bi\text{,}\) we have

\begin{equation*} z\conj{z}=a^2+b^2=\len{z}^2\geq 0\text{.} \end{equation*}

We leads us to define the dot product on \(\complex^n\) so that

\begin{equation*} \vvec\cdot\vvec = |v_1|^2 + |v_2|^2 + \ldots + |v_n|^2 = \len{\vvec}^2 \geq 0\text{,} \end{equation*}

which means

\begin{equation*} \vvec\cdot\wvec = v_1\conj{w_1}+v_2\conj{w_2} + \ldots+v_n\conj{w_n}\text{.} \end{equation*}

With this definition, the three properties above still hold except that the symmetry condition is modified to \(\vvec\cdot\wvec = \conj{\wvec\cdot\vvec}\text{.}\)

Definition 1.3.1.

If \(V\) is a vector space, we call \(\inner{}{}\) an inner product provided that

Positivity.
\(\inner{\vvec}{\vvec}\geq 0\) and \(\inner{\vvec}{\vvec} = 0\) if and only if \(\vvec=0\text{.}\)
Conjugate symmetry.
\(\conj{\inner{\vvec}{\wvec}} = \inner{\wvec}{\vvec}\text{.}\)
Linearity.
\(\inner{c_1\vvec_1+c_2\vvec_2}{\wvec} = c_1\inner{\vvec_1}{\wvec} + c_2\inner{\vvec_2}{\wvec}\text{.}\)

Example 1.3.2.

If \(V=\complex^n\text{,}\) then \(\inner{\vvec}{\wvec} = \vvec\cdot\wvec\) is an inner product.

In fact, this is true if \(V=\real^n\) as well. If \(x\) is real, then \(\conj{x} = x\) so the conjugate symmetry condition is the same as the symmetry condition above.

Example 1.3.3.

If \(\poly\) is the vector space of all polynomials over \(\field\text{,}\) then

\begin{equation*} \inner{p}{q}=\int_{-1}^1 p(x)\conj{q(x)}~dx \end{equation*}

is an inner product.

This may seem strange when you first see it, but it is just an extension of the usual dot product in some sense. For instance, think of a three-dimensional vector as a function from the set \(\{-1,0,1\}\) into \(\real\text{.}\) The dot product between two vectors is then

\begin{equation*} \vvec\cdot\wvec = \vvec(-1)\conj{\wvec(-1)} + \vvec(0)\conj{\wvec(0)} + \vvec(1)\conj{\wvec(1)}\text{,} \end{equation*}

so that we multiply the the value of \(v\) and \(\conj{\wvec}\) at each point and add. If we interpret the integral as an infinite sum, this is what the inner product defined above is doing.

Example 1.3.4.

Suppose \(V=\field^{m,n}\text{,}\) the vector space of \(m\times n\) matrices. If \(A\) is such a matrix, we define \(A^*\) to be its conjugate transpose. That is, \(A^*=\conj{A^T}\text{.}\) Then

\begin{equation*} \inner{A}{B} = \tr(AB^*) \end{equation*}

is an inner product, where \(\tr\) represents the trace of a matrix, the sum of its diagonal entries.

It’s relatively straightforward to show that

\begin{equation*} \inner{A}{A} = \sum_{i,j}\len{A_{i,j}}^2\text{.} \end{equation*}

It may be useful to note the following consequence of the conjugate symmetry and linearity properties:

\begin{equation*} \inner{\vvec}{c_1\wvec_1+c_2\wvec_2} =\conj{c_1}\inner{\vvec}{\wvec_1} + \conj{c_2}\inner{\vvec}{\wvec_2}\text{.} \end{equation*}

Definition 1.3.5.

We typically refer to a vector space \(V\) with an inner product as an inner product space.

Definition 1.3.6.

The length or norm of a vector \(\vvec\) in an inner product space is

\begin{equation*} \len{\vvec} = \sqrt{\inner{\vvec}{\vvec}}\text{.} \end{equation*}

With this definition, it follows that \(\len{s\vvec} = \len{s}\len{\vvec}\text{.}\)

Definition 1.3.7.

If \(V\) and \(W\) are inner product spaces and \(T:V\to W\) is a vector space isomorphism such that

\begin{equation*} \inner{T(\vvec_1)}{T(\vvec_2)} = \inner{\vvec_1}{\vvec_2} \end{equation*}

for all vectors \(\vvec_1\) and \(\vvec_2\text{,}\) we say that \(T\) is an isometry of vector spaces.

Subsection 1.3.2 Orthogonality

Since an inner product is the same concept as the dot product extended to vector spaces, we have access to many similar concepts, such as orthogonality.

Definition 1.3.8.

Two vectors \(\vvec\) and \(\wvec\) in an inner product space are orthogonal if \(\inner{\vvec}{\wvec} = 0\text{.}\)

Example 1.3.9.

If \(V=\poly\text{,}\) the set of all polynomials, with the inner product given in Example 1.3.3, then \(p(x)=x-x^3\) is orthogonal to \(q(x)=x^2+7x^8\text{.}\) This follows because each term in \(p(x)\conj{q(x)}\) is an odd power of \(x\) whose integral on the interval \([-1,1]\) will be zero by symmetry.

More generally, any polynomial whose terms are all of odd degree is orthogonal to any polynomials whose terms are all of even degree.

Proposition 1.3.10. Pythagorean theorem.

If \(\vvec\) and \(\wvec\) are two orthogonal vectors in an inner product space, then

\begin{equation*} \len{\vvec+\wvec}^2 = \len{\vvec}^2+\len{\wvec}^2\text{.} \end{equation*}

Proof.

The follows from the linearity of the inner product:

\begin{align*} \len{\vvec+\wvec}^2 \amp = \inner{\vvec+\wvec}{\vvec+\wvec}\\ \amp = \inner{\vvec}{\vvec}+\inner{\vvec}{\wvec}+\inner{\wvec}{\vvec} + \inner{\wvec}{\wvec}\\ \amp = \len{\vvec}^2 + \len{\wvec}^2 \end{align*}

Definition 1.3.11.

In an inner product space, we say that \(\basis{\vvec}{m}\) is an orthogonal set if each vector is nonzero and each pair of vectors is orthogonal to one another.

Proposition 1.3.12.

In an inner product space, an orthogonal set is linearly independent.

Proof.

Suppose that \(\basis{\vvec}{m}\) is an orthogonal set and that

\begin{equation*} a_1\vvec_1 + a_2\vvec_2 + \ldots + a_m\vvec_m = 0\text{.} \end{equation*}

If we take the inner product with \(\vvec_j\) for any \(j\text{,}\) we have

\begin{align*} \inner {a_1\vvec_1 + a_2\vvec_2 + \ldots + a_m\vvec_m} {\vvec_j} \amp = 0\\ a_1\inner{\vvec_1}{\vvec_j} + a_2 \inner{\vvec_2}{\vvec_j} + \ldots + a_m\inner{\vvec_m}{\vvec_j} \amp = 0\\ a_j\len{\vvec_j}^2 \amp = 0 \end{align*}

which says that \(a_j=0\text{.}\)

From this, we conclude that an orthogonal set forms a basis for a subspace of the inner product space.

Proposition 1.3.13. Projection formula.

Suppose that \(\basis{\wvec}{m}\) is an orthogonal set in an inner product space \(V\) and that \(\bvec\) is a vector in \(V\text{.}\) The closest vector in \(W\) to \(\bvec\) is called the orthogonal projection of \(\bvec\) onto \(W\) and is given by

\begin{equation*} \bhat= \frac{\inner{\bvec}{\wvec_1}}{\inner{\wvec_1}{\wvec_1}}\wvec_1 + \frac{\inner{\bvec}{\wvec_2}}{\inner{\wvec_2}{\wvec_2}}\wvec_2 + \ldots + \frac{\inner{\bvec}{\wvec_m}}{\inner{\wvec_m}{\wvec_m}}\wvec_m\text{.} \end{equation*}

Proof.

This is the same expression as the Projection Formula

understandinglinearalgebra.org/sec-orthogonal-bases.html#prop-proj-formula

that we frequently used in our previous classes and its found by the same argument.

We first find the vector \(\bhat\) so that \(\bvec-\bhat\) is orthogonal to \(W\) and then explain why it is the closest vector.

Notice that, by linearity, if a vector \(\uvec\) is orthogonal to each \(\wvec_j\text{,}\) then it is orthogonal to every vector in \(W\text{.}\) This is because any vector in \(W\) is a linear combination of \(\basis{\wvec}{m}\) so that

\begin{equation*} \wvec = c_1\wvec_1 + c_2\wvec_2+\ldots+c_n\wvec_m \end{equation*}

and therefore

\begin{equation*} \inner{\wvec}{\uvec} = c_1\inner{\wvec_1}{\uvec} + \ldots + c_n\inner{\wvec_m}{\uvec} = 0\text{.} \end{equation*}

We require that \(\bvec-\bhat\) be orthogonal to \(W\) so that

\begin{equation*} \inner{\bvec-\bhat}{\wvec_j} = 0 \end{equation*}

\begin{equation*} \inner{\bvec}{\wvec_j} = \inner{\bhat}{\wvec_j} \end{equation*}

for all \(j\text{.}\) Since \(\bhat\) is in \(W\text{,}\) it can be expressed as a linear combination of \(\wvec_j\text{:}\)

\begin{equation*} \bhat=c_1\wvec_j+\ldots+c_m\wvec_m \end{equation*}

so that we have

\begin{equation*} \inner{\bvec}{\wvec_j}=\inner{c_1\wvec_1}{\wvec_j}+\ldots+ \inner{c_m\wvec_m}{\wvec_j}\text{,} \end{equation*}

which gives the expression for \(\bhat\) in the statement of the proposition.

Now suppose that \(\wvec\) is any other vector in \(W\text{.}\) Then \(\bhat - \wvec\) is in \(W\) and hence orthogonal to \(\bvec-\bhat\text{.}\) Therefore,

\begin{equation*} \len{\bvec-\wvec}^2 = \len{(\bvec-\bhat)+(\bhat-\wvec)}^2 = \len{\bvec-\bhat}^2 + \len{\bhat-\wvec}^2 \end{equation*}

by the Pythagoren theorem and hence

\begin{equation*} \len{\bvec-\wvec}^2 \geq \len{\bvec-\bhat}^2\text{,} \end{equation*}

which shows that \(\bhat\) is closer to \(\bvec\) than any other vector in \(W\text{.}\)

The Projection Formula was key to a wide range of useful concepts. In particular, we can apply the Gram-Schmidt algorithm as we did earlier.

Definition 1.3.14.

A set of vectors is called orthonormal if each pair of vectors is orthogonal and each vector has unit length.

Proposition 1.3.15.

If \(W\) is a finite dimensional subspace of an inner product space \(V\text{,}\) then there is an orthonormal basis for \(W\text{.}\)

Proof.

We choose any basis \(\basis{\vvec}{m}\) for \(W\) and then define

\begin{align*} \wvec_1\amp=\vvec_1\\ \wvec_2\amp = \vvec_2 - \frac{\inner{\vvec_2}{\wvec_1}}{\inner{\wvec_1}{\wvec_1}}\\ \wvec_3\amp = \vvec_3 - \frac{\inner{\vvec_3}{\wvec_1}}{\inner{\wvec_1}{\vvec_1}} -\frac{\inner{\vvec_3}{\wvec_2}}{\inner{\wvec_2}{\vvec_2}} \end{align*}

and so on. This produces an orthogonal basis for \(W\) since, at every step, \(\laspan{\basis{\wvec}{j}} = \laspan{\basis{\vvec}{j}}\text{.}\)

Finally, we define \(\uvec_j = \frac{\wvec_j}{\len{\wvec_j}}\) to obtain an orthonormal basis for \(W\text{.}\)

Notice that a vector space \(V\) is a subspace of itself so the previous proposition implies that every finite dimensional subspace has an orthonormal basis.

Also, remember that any linearly independent set in \(V\) can be extended to a basis of \(V\) by Proposition 1.1.31. If we begin with an orthonormal set of vectors in \(V\text{,}\) we can extend it to a basis of \(V\text{,}\) and apply the Gram-Schmidt algorithm to the added basis vectors to obtain an orthonormal basis of \(V\text{.}\) In other words,

Proposition 1.3.16.

Any orthonormal set in \(V\) can be extended to an orthonormal basis for \(V\text{.}\)

Subsection 1.3.3 The adjoint of a linear transformation

We suppose now that \(V\) and \(W\) are inner product spaces over a field \(\field\text{.}\) If \(T:V\to W\) is a linear transformation, we can define its adjoint \(T^*\) through the following relationship

\begin{equation*} \inner{T\vvec}{\wvec}=\inner{\vvec}{T^*\wvec} \end{equation*}

for every \(\vvec\) in \(V\) and \(\wvec\) in \(W\text{.}\) We can also write this expression as

\begin{equation*} \inner{T^*\wvec}{\vvec} = \inner{\wvec}{T\vvec} \end{equation*}

by applying the conjugate symmetry condition twice. The first thing to establish is that \(T^*\) is itself an linear transformation.

We will first prove a useful result in the simple case that \(T:V\to \field\text{.}\)

Proposition 1.3.17. Riesz represenation theorem.

Suppose that \(V\) is an inner product space and \(\phi:V\to \field\) is a linear transformation. Then there is a unique vector \(\uvec\) such that

\begin{equation*} \phi(\vvec) = \inner{\vvec}{\uvec}\text{.} \end{equation*}

Proof.

If \(\phi = 0\text{,}\) then we can take \(\uvec=0\) as well.

So suppose that \(\phi\neq 0\text{,}\) which means that there is a vector \(\vvec\) such that \(\phi(\vvec) \neq 0\text{.}\) Therefore, \(\phi\) is onto and \(\range(\phi)=\field\text{.}\)

If \(\dim V = n\text{,}\) we know that

\begin{equation*} \dim \nul(\phi) = \dim V - \dim \range(\phi) = n-1\text{.} \end{equation*}

Choose an orthonormal basis \(\basis{\vvec}{n-1}\) for \(\nul(\phi)\text{.}\) We know by Proposition 1.3.16 that we can add a vector \(\wvec\) to obtain an orthonormal basis. Let \(\uvec=\phi(\wvec)\wvec\text{.}\)

If \(\vvec\) is a vector in \(V\text{,}\) then

\begin{equation*} \vvec=c_1\vvec_1+\ldots+c_{n-1}\vvec_{n-1} + c_n\wvec\text{.} \end{equation*}

Then

\begin{equation*} \phi(\vvec) = c_n\phi(\wvec) = \inner{\vvec}{\uvec}\text{.} \end{equation*}

To see that \(\uvec\) is unique, suppose that there are two such vectors \(\uvec_1\) and \(\uvec_1\) such that

\begin{equation*} \phi(\vvec) = \inner{\vvec}{\uvec_1}= \inner{\vvec}{\uvec_2} \end{equation*}

for every vector \(\vvec\text{.}\) In particular, we have \(\inner{\vvec}{\uvec_1-\uvec_2} = 0\) for every \(\vvec\) including \(\vvec=\uvec_1-\uvec_2\text{.}\) Therefore,

\begin{equation*} \inner{\uvec_1-\uvec_2}{\uvec_1-\uvec_2} = 0\text{,} \end{equation*}

which implies that \(\uvec_1=\uvec_2\text{.}\)

Definition 1.3.18.

If \(V\) and \(W\) are inner product spaces and \(T:V\to W\) a linear transformation, the adjoint of \(T\) is defined by \(T^*:W\to V\) by

\begin{equation*} \inner{T\vvec}{\wvec} = \inner{\vvec}{T^*(\wvec)} \end{equation*}

or equivalently

\begin{equation*} \inner{T^*\wvec}{\vvec} = \inner{\wvec}{T(\vvec)}\text{.} \end{equation*}

There are a number of things implied by this definition so we need to check that they are satisfied. The following proposition will take care of this for us.

Proposition 1.3.19.

The adjoint \(T^*:W\to V\) is a linear transformation.

Proof.

We first need to establish that \(T^*(\wvec)\) is a vector in \(V\) for every \(\wvec\) in \(V\text{.}\) For a fixed \(\wvec\) in \(W\text{,}\) define the linear transformation \(\phi:V\to \field\) by

\begin{equation*} \phi(\vvec) = \inner{T(\vvec)}{\wvec}\text{.} \end{equation*}

By Proposition 1.3.17, we know there is a vector \(\uvec\) in \(V\) such that \(\phi(\vvec) = \inner{\vvec}{\uvec}\) so we define \(T^*(\wvec) = \uvec\text{,}\) which gives

\begin{equation*} \phi(\vvec) = \inner{T(\vvec)}{\wvec} = \inner{\vvec}{\uvec} = \inner{\vvec}{T^*(\wvec)}\text{.} \end{equation*}

We have now defined a function \(T^*:W\to V\) such that \(\inner{T(\vvec)}{\wvec} = \inner{\vvec}{T^*(\wvec)}\) for all \(\vvec\) and \(\wvec\text{.}\) We just need to show that \(T^*\) is a linear transformation.

We need to show that \(T^*\) satisfies the two linearity properties. Suppose that \(\wvec_1\) and \(\wvec_2\) are vectors in \(V\text{.}\) Then

\begin{align*} \inner{\vvec}{T^*(\wvec_1+\wvec_2)} \amp = \inner{T\vvec}{\wvec_1+\wvec_2} = \inner{T\vvec}{\wvec_1} + \inner{T\vvec}{\wvec_2}\\ \amp =\inner{\vvec}{T^*(\wvec_1) + T^*(\wvec_2)} \end{align*}

Since this holds for any vector \(\vvec\text{,}\) we have

\begin{equation*} T^*(\wvec_1+\wvec_2) = T^*(\wvec_1) + T^*(\wvec_2)\text{.} \end{equation*}

In the same way, we see that \(T^*(s\wvec) = sT^*(\wvec)\text{,}\) which verifies that \(T^*\) is an operator on \(T\text{.}\)

We now relate the matrices associated to \(T\) and \(T^*\) with respect to an orthonormal basis of \(V\text{.}\) As before, we use \(\uvec_1,\ldots,\uvec_n\) to denote an orthonormal basis of \(V\text{.}\)

Proposition 1.3.20.

Suppose that \(V\) and \(W\) are inner product spaces with orthonormal bases \(\bcal\) and \(\ccal\text{,}\) respectively. If \(T:V\to W\) is a linear transformation, \(A=\coords{T}{\ccal,\bcal}\text{,}\) and \(B=\coords{T^*}{\bcal,\ccal}\text{,}\) then

\begin{equation*} B= A^* = \conjugate{A}^T\text{,} \end{equation*}

the conjugate transpose of \(A\text{.}\)

Proof.

If \(\bcal=\{\basis{\vvec}{n}\}\) and \(\ccal=\{\basis{\wvec}{m}\}\text{,}\) then

\begin{align*} T(\vvec_j) \amp = A_{1,j}\wvec_1 + \ldots + A_{m,j}\wvec_m\\ T^*(\wvec_i) \amp = B_{1,i}\vvec_1 + \ldots + B_{n,i}\vvec_n \end{align*}

which says that

\begin{equation*} A_{i,j} = \inner{T(\vvec_j)}{\wvec_i} = \inner{\vvec_j}{T^*(\wvec_i)} = \conj{\inner{T^*(\wvec_i)}{\vvec_j}} = \conj{B_{j,i}}\text{.} \end{equation*}

Real adjoints.

If the underlying field \(\field=\real\text{,}\) then the matrix associated to the adjoint \(T^*\) is just the transpose of the matrix associated to \(T\text{.}\) In other words, \(B=A^T\) in the notation of Proposition 1.3.20.

Prev Top Next