Linear Algebra 3

Linear Transformations - via Matrices and More

Portions copyright Rob Beezer (GFDL)

Mon, Feb 03, 2025

Linear transformations

In Calculus, we develop a slew of tools to study a particular class of functions - namely real functions. These are functions \(f:\mathbb R\to \mathbb R\) that satisfy certain properties, like continutity, differentiability, and/or integrability.

In Linear Algebra, the most important types of functions we study are linear transformations, i.e. functions mapping one vector space to another that satisfy certain axioms.

Today, we’ll jump into the study of linear transformations. We’ll begin with their abstract definition, then focus on transformations defined by matrix multiplication.

\[ \newcommand{\matrixentry}[2]{\left\lbrack#1\right\rbrack_{#2}} \]

The basics

Definition

A linear transformation \(T\) is a function mapping a vector space \(U\) to a vector space \(V\) that satisfies \[T(\alpha \vec{u} + \beta\vec{v}) = \alpha T(\vec{u}) + \beta T(\vec{v})\] for all \(\vec{u}\in U\), \(\vec{v} \in V\), and \(\alpha,\beta\in\mathbb R\).

Example 1

Consider the one-dimensional vector space \(\mathbb R^1\). (Or just \(\mathbb R\), I suppose.) Let \(a\) be a real number and define \(T\) by \[T(x) = ax.\]

Then \(T\) is a linear transformation \(T:\mathbb R \to \mathbb R\).

To prove this, simply note that \[\begin{aligned} T(\alpha x + \beta y) &= a(\alpha x + \beta y) \\ &= \alpha \, ax + \beta \, ay = \alpha T(x) + \beta T(y). \end{aligned}\]

Note that \(f(x) = ax+b\) is not a linear transformation.

Example 2

Let \(V\) denote the vector space of all polynomials and let \(T\) denote the differentiation operator. Then \(T\) is a linear transformation mapping \(V\to V\).

To prove this, simply note that \[\frac{d}{dx}(\alpha p(x) + \beta q(x)) = \alpha p'(x) + \beta q'(x).\]

Question: What if \(V\) consists of the larger class of functions that are just assumed to be differentiable?

Coming example

Let \(M\) be an \(m\times n\) matrix and for \(\vec{u}\in\mathbb R^n\), define \(T(\vec{u})\) by \[T(\vec{u}) = M\vec{u}.\] Then T defines a linear transformation mapping \(\mathbb R^n\to\mathbb R^m\).

Well, I guess we better define matrix multiplication!

Matrix algebra

Our first main objective today is to define the algebraic operations on matrices, i.e.

Addition,
Scalar multiplication, and then…
Matrix multiplication

This will all be done componentwise, so we’ll first specify some useful notation for operations that reference components.

Component notation

The notation \([A]_{ij}\) will refer to the entry in row \(i\) and column \(j\) and \(\mathbf{A}_i\) will refer to the \(i^{\text{th}}\) column. For example, if \[ B=\begin{bmatrix} -1&2&5&3\\ 1&0&-6&1\\ -4&2&2&-2 \end{bmatrix}, \] Then \([B]_{32} = 2\) and \[ \mathbf{B}_3 = \begin{bmatrix} 5\\-6\\2 \end{bmatrix}. \]

Matrix addition

Matrix addition is defined in the simplest possible componentwise manner. That is, if \(A\) and \(B\) are \(m\times n\) matrices, then \(A+B\) is the matrix satisfying \[ [A+B]_{ij} = [A]_{ij} + [B]_{ij} \] for all \(i,j\) satisfying \(1\leq i \leq m\) and \(1\leq j \leq n\). For example, \[ \begin{bmatrix} 2&-3&4\\ 1&0&-7 \end{bmatrix} + \begin{bmatrix} 6&2&-4\\ 3&5&2 \end{bmatrix} = \begin{bmatrix} 8&-1&0\\ 4&5&-5 \end{bmatrix}. \]

Scalar multiplication

Scalar multiplication is also defined in the simplest possible componentwise manner. If \(A\) is a matrix and \(\alpha \in \mathbb R\), then \(\alpha A\) is the matrix satisfying \[ [\alpha A]_{ij} = \alpha [A]_{ij}. \] For example, \[ 2\begin{bmatrix} 2&-3&4\\ 1&0&-7 \end{bmatrix} = \begin{bmatrix} 4&-6&8\\ 2&0&-14 \end{bmatrix}. \]

Vector space properties

It’s worth pointing out that the collection of all \(m\times n\) matrices with these properties satisfies all the properties necessary to form a vector space over the real numbers. Given matrices \(A\), \(B\), and \(C\), for example, and scalars \(\alpha\) and \(\beta\), we have statements like

\(A+B = B+A\),
\((A+B)+C = A+(B+C)\),
\((\alpha + \beta)A = \alpha A + \beta A\),
etc.

Proving vector space properties

It’s not hard to prove the vector space properties and doing so provides a nice illustration of the power of the component notation. Here’s a proof of the distributive law of scalar multiplication over matrix addition, for example.

Claim: Given \(m\times n\) matrices \(A\) and \(B\) and a scalar \(\alpha\), \[\alpha (A+B) = \alpha A + \alpha B.\]

Proof:

\[ \begin{align*} [\alpha(A+B)]_{ij} &= \alpha [A+B]_{ij} && \text{Def scalar multiplication} \\ &= \alpha ([A]_{ij} + [B]_{ij}) && \text{Def matrix addition} \\ &= \alpha [A]_{ij} + \alpha [B]_{ij} && \text{Real dist} \\ &= [\alpha A]_{ij} + [\alpha B]_{ij} && \text{Def scalar multiplication} \\ &= [\alpha A + \alpha B]_{ij} && \text{Def matrix addition} \end{align*} \]

Matrix multiplication

We now prepare to define matrix multiplication. The definition is again componentwise but it’s more complicated than your first guess might be.

Ultimately, matrix multiplication is used to describe the general linear transformation mapping \(\mathbb R^n \to \mathbb R^m\) and it is that objective that drives the definition.

Definition

Supose that \(A\) is an \(m\times n\) matrix and that \(B\) is an \(n\times p\) matrix. The matrix product \(AB\) is then an \(m\times p\) matrix whose entries are \[ \matrixentry{AB}{ij} = \sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B}{kj}\text{.} \] In words, the entry in row \(i\) and column \(j\) of \(AB\) is obtained by multiplying the \(i^{\text{th}}\) row of \(A\) with the \(j^{\text{th}}\) column of \(B\) componentwise and adding the results.

You might recognize that operation as the dot product of the \(i^{\text{th}}\) row of \(A\) with the \(j^{\text{th}}\) column of \(B\). We’ll get to the dot product soon enough!

Example

It’s not so hard once you do a few! Here’s an example:

\[ \left[\begin{matrix}-1 & 1 & -1\\-1 & -3 & -3\end{matrix}\right] \left[\begin{matrix}1 & 0 & -2 & 3\\-1 & -1 & 1 & 1\\-1 & 0 & 3 & 3\end{matrix}\right] = \left[\begin{matrix}-1 & -1 & 0 & -5\\5 & 3 & -10 & -15\end{matrix}\right] \]

You can see how it’s crucial that the number of columns of \(A\) equals the number of rows of \(B\).

Another example

\[ \begin{bmatrix} 2 & -1 \\ -3 & 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 2x_1 - x_2 \\ -3x_1 + 2x_2 \end{bmatrix} \]

Thus, we can represent systems using a compact matrix multiplication.

An important special case

If \(A\) is \(m\times n\) and \(B\) is \(n\times 1\), then we might think of \(B\) as a column vector in \(\mathbb R^n\) and we might even denote it with a \(\vec{v}\).

Note then that \(A\vec{v}\) is an \(m\times 1\) column vector thus, the function \[\vec{v} \to A\vec{v}\] maps \(\mathbb R^n \to \mathbb R^m\).

As we’ll show, this function is, in fact, a linear transformation. Furthermore, any linear transformation from \(\mathbb R^n\) to \(\mathbb R^m\) can be represented in this fashion.

Example

Here’s a fun example:

\[ \left[\begin{matrix}2 & 5 & 4 & 2\\4 & -5 & 1 & -3\\0 & -4 & -4 & 1\end{matrix}\right] \left[\begin{matrix}0\\0\\1\\0\end{matrix}\right] = \left[\begin{matrix}4\\1\\-4\end{matrix}\right] \]

This illustrates the fact that the product of a matrix with one of the standard coordinate basis vectors extracts a column from the matrix.

Algebra of matrix multiplication

Our next goal will be to establish three major algebraic properties of matrix multiplication that are essential to understand how matrix multiplication defines a linear transformation. Namely,

The Distributive Law: If \(A\) is \(m\times n\) and \(B\) and \(C\) are both \(n\times p\), then

\[A(B+C) = AB + AC.\]

Compatibility of matrix and scalar multiplication: If \(A\) is \(m\times n\), \(B\) is \(n\times p\), and \(\alpha\in\mathbb R\), then \[A(\alpha B) = \alpha (AB).\]

The Associative Law: If \(A\) is \(m\times n\) and \(B\) is \(n\times p\), and \(C\) is \(p\times s\), then

\[(AB)C = A(BC).\]

Importance of these properties

Again, a major objective is to study functions mapping \(\mathbb R^n \to \mathbb R^m\) defined by \(T(\vec{u}) = A\vec{u}\), where \(A\) is an \(m\times n\) matrix. In particular, we would like to show that \(T\) is a linear transformation.

Well, if we specialize the distributive law to the case that \(B\) and \(C\) are \(m\times 1\) column vectors, say \(\vec{u}\) and \(\vec{v}\), then it becomes \[A(\vec{u} + \vec{v}) = A\vec{u} + A\vec{v}.\]

The compatibility property becomes \[A(\alpha \vec{u}) = \alpha (A\vec{u}).\]

These are exactly what we need to show that matrix multiplication defines a linear transformation.

What about the associative law?

If we specialize the associative law so that \(A\) is \(m\times n\) and \(B\) is \(n\times p\), and \(C=\vec{u}\) is \(p\times 1\), then it becomes \[(AB)\vec{u} = A(B\vec{u}).\]

Now suppose that \(T_1\) and \(T_2\) are the linear transformations defined by \[T_1(\vec{u}) = A\vec{u} \text{ and } T_2(\vec{u}) = B\vec{u}.\]

Then, we should have \(T_1\circ T_2(\vec{u}) = T_1(T_2(\vec{u}))\).

Note that the associative law verifies this. Put another way, the associative law implies that the product \(AB\) is the matrix that represents the linear transformation \(T_1\circ T_2\).

Proofs of the algebraic properties of multiplication

In this column of slides, we record the proofs of the three laws of

Compatibility,
Distributivity, and
Associativity.

Again, these laws taken together show that matrix multiplication of an \(m\times n\) matrix with an \(n\times 1\) column vector defines a linear transformation.

Proof of the compatibility property

We’ll show that \(A(\alpha B) = \alpha (AB)\) by showing that \[[A(\alpha B)]_{ij} = [\alpha (AB)]_{ij}.\] In words, we show that the entries are all equal. We do this like so:

\[\begin{align*} [A(\alpha B)]_{ij} &= \sum_{k=1}^n [A]_{ik}[\alpha B]_{kj} && \text{Def Matrix Mult} \\ &= \sum_{k=1}^n [A]_{ik}\alpha [B]_{kj} && \text{Real Dist Property} \\ &= \alpha \sum_{k=1}^n [A]_{ik}[B]_{kj} && \text{Real Dist Property} \\ &= \alpha [AB]_{ij} && \text{Def Matrix Mult} \end{align*}\]

Proof of the distributive property

\[\begin{align*} \matrixentry{A(B+C)}{ij}&=\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B+C}{kj}&& \text{Def Matrix Mult} \\ &=\sum_{k=1}^{n}\matrixentry{A}{ik}(\matrixentry{B}{kj}+\matrixentry{C}{kj})&& \text{Def Matrix Addition} \\ &=\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B}{kj}+\matrixentry{A}{ik}\matrixentry{C}{kj}&& \text{Real Dist Property} \\ &=\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B}{kj}+\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{C}{kj}&& \text{Real Comm Property} \\ &=\matrixentry{AB}{ij}+\matrixentry{AC}{ij}&& \text{Def Matrix Mult} \\ &=\matrixentry{AB+AC}{ij}&& \text{Def Matrix Addition} \\ \end{align*}\]

Proof of the associative property

\[\begin{align*} \matrixentry{A(BD)}{ij}&=\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{BD}{kj} =\sum_{k=1}^{n}\matrixentry{A}{ik}\left(\sum_{\ell=1}^{p}\matrixentry{B}{k\ell}\matrixentry{D}{\ell j}\right) && \text{Mat Mul }\times 2 \\ &=\sum_{k=1}^{n}\sum_{\ell=1}^{p}\matrixentry{A}{ik}\matrixentry{B}{k\ell}\matrixentry{D}{\ell j} =\sum_{\ell=1}^{p}\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B}{k\ell}\matrixentry{D}{\ell j} && \text{Dist & Comm} \\ &=\sum_{\ell=1}^{p}\matrixentry{D}{\ell j}\left(\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B}{k\ell}\right) = \sum_{\ell=1}^{p}\matrixentry{D}{\ell j}\matrixentry{AB}{i\ell} && \text{Dist & Mat Mul} \\ &=\sum_{\ell=1}^{p}\matrixentry{AB}{i\ell}\matrixentry{D}{\ell j} = \matrixentry{(AB)D}{ij}&& \text{Comm & Mat Mul} \end{align*}\]

A non-law

We go through this tedium for a couple of reasons:

We use and rely on these properties constantly while actually doing linear algebra.
While the proofs of these laws show that the properties are, in some sense, inherited from the real numbers, they are not automatic.

To illustrate the second point, note that matrix multipliction is not commutative. In fact, if \(A\) is \(m\times n\) and \(B\) is \(n\times p\), then \(AB\) and \(BA\) are both defined only when \(m=p\).

Even when \(A\) and \(B\) are both \(2\times2\) matrices, we need not have \(AB=BA\). I invite you to search for examples!

Matrix representations

We now turn the representation question around. That is, given a linear transformation \(T:\mathbb R^n \to \mathbb R^m\), we wonder if we can find a matrix \(A\) such that \[ T(\vec{v}) = A\vec{v}. \] In this context, we think of \(T\) an oracle that will tell us the value of \(T(\vec{x})\) for an \(\vec{x}\in \mathbb R^n\).

The oracle, though, is a black box; we have no other information on \(T\).

The standard basis in \(\mathbb R^n\)

Recall this example illustrating that the product of a matrix with one of the standard coordinate basis vectors extracts a column from the matrix. This turns out to be key so let’s generalize it.

Let \(\vec{e}_j = \langle e_1,e_2,e_3,\cdots,e_n \rangle\) be the vector whose \(i^{\text{th}}\) component is defined by \[ e_i = \begin{cases}1 & i = j \\ 0 & i\neq j\end{cases}. \] Now, if \(A\) is an \(m\times n\) matrix, note that \[ A\vec{e}_j = [\mathbf{A}]_j. \] That is \(\vec{e}_j\) is exactly the \(j^{\text{th}}\) column of \(A\).

Choosing \(A\)

Now, if we want \(A\) to model \(T\), we’ll choose the \(j^{\text{th}}\) column of \(A\) to be \(T(\vec{e}_j)\)

If, for example,

\[\begin{align*} T(\langle 1,0 \rangle) &= \langle 1,2,3 \rangle \text{ and } \\ T(\langle 0,1 \rangle) &= \langle 1,-1,1 \rangle, \end{align*}\]

then choose \(A\) to be

\[\begin{bmatrix} 1 & 1 \\ 2 & -1 \\ 3 & 1 \end{bmatrix}.\]

Extending the application of \(A\)

At this point, we’ve got \(A\) chosen so that \[ A\vec{e}_j = T(\vec{e}_j) \] for every \(j\). If \(\vec{v}\) is any vector in \(\mathbb R^b\), then we can write \[\vec{v} = \alpha_1 \vec{e}_1 + \alpha_2 \vec{e}_2 + \cdots + \alpha_n \vec{e}_n.\]

Finishing the extension

Thus,

\[\begin{align*} A\vec{v} &= A(\alpha_1 \vec{e}_1 + \alpha_2 \vec{e}_2 + \cdots + \alpha_n \vec{e}_n) \\ &= \alpha_1 A\vec{e}_1 + \alpha_2 A\vec{e}_2 + \cdots + \alpha_n A\vec{e}_n \\ &= \alpha_1 T(\vec{e}_1) + \alpha_2 T(\vec{e}_2) + \cdots + \alpha_n T(\vec{e}_n) \\ &= T(\alpha_1 \vec{e}_1 + \alpha_2 \vec{e}_2 + \cdots + \alpha_n \vec{e}_n) = T(\vec{v}) \end{align*}\]

Range and null spaces

We’re now going to introduce a couple of subspaces associated with a linear transformation \(T:U\to V\). These are

The null space of \(T\), which is a subspace of \(U\).
The range of \(T\), which is a subspace of \(V\) and

Throughout this column of slides, \(U\) and \(V\) are vector spaces and \(T\) is a linear transformation mapping \(U\to V\).

The null-space of a linear transformation

The null-space of \(T\) is the set of all vectors mapping to the zero element of \(V\). We often write the null-space as \({\cal N}(T)\) Symbolically. \[{\cal N}(T) = \{\vec{u}\in U: T(\vec{u}) = \vec{0}\}.\]

Since \({\cal N}(T)\subset V\), operations with its elements automatically satisfy the vector space properties. Thus, to show that \({\cal N}(T)\) is a subspace, we need only show that it’s algebraically closed under linear combinations.

To do this, suppose that \(\vec{u},\vec{v}\in {\cal N}(T)\). Then \[T(\alpha \vec{u} + \beta\vec{v}) = \alpha T(\vec{u}) + \beta T(\vec{v}) = \alpha \vec{0} + \beta \vec{0} = \vec{0}.\]

The range of a linear transformation

The range is exactly the functional range that you learn of in calculus. That is, \[\text{range}(T) = \left\{\vec{v}\in V: \text{there is an } \vec{u}\in U\text{ with } T(\vec{u}) = \vec{v}\right\}.\]

It’s not hard to show that the range is a subspace of \(V\). To do so, suppose that \(\vec{y}_1\) and \(\vec{y}_2\) are in the the range of \(T\), then there are \(\vec{x}_1,\vec{x}_2\in U\) such that \[T(\vec{x}_1) = \vec{y}_1 \text{ and } T(\vec{x}_2) = \vec{y}_2.\]

Thus,

\[\begin{align*} \alpha \vec{y}_1 + \beta \vec{y}_2 &= \alpha T(\vec{x}_1) + \beta T(\vec{x}_2) \\ &= T(\alpha \vec{x}_1 + \beta \vec{x}_2). \end{align*}\]

Example

Suppose that \(U\) and \(V\) both represent the vector space of polynomials of degree at most 2. Suppose, also, that \(D:U\to V\) represents differentiation, which we know to be a linear transformation.

Then, the range of \(D\) is exactly the subspace of \(V\) consisting of polynomials of degree at most 1.

The null-space of \(D\) is the subspace of \(U\) that consists of all constants.

Euclidean range and null-space

We now specialize the previous column of slide to the Euclidean case. Thus, in this column of slides, we have a linear transformation \(T:\mathbb R^n \to \mathbb R^m\).

The null-space and range are still defined in the same way. Now, though, there is an \(m\times n\) matrix \(A\) such that \[T(\vec{x}) = A\vec{x}.\] As it turns out, null-space and range can be expressed in terms of \(A\) and computed efficiently using the reduced row echelon form of \(A\).

The null-space

Expressed in terms of \(A\), the null-space is the set of all \(x\in\mathbb R^n\) such that \[A\vec{x}=\vec{0}.\] Of course, we’ve spent some time solving exactly this kind of system so those techniques help us find the null-space of \(T\).

In particular, we can easily find the null-space, once \(A\) is in reduced row echelon form. It turns out that the reduced row echelon form helps us find the range efficiently as well.

Range as column space

The column space of the \(m\times n\) matrix \(A\) is defined to be the linear span of the columns of \(A\). Thus, it’s immediately seen to be a subspace of \(\mathbb R^m\).

Recall, though, that the system \(A\vec{x} = \vec{y}\) has a solution precisely when the vector \(\vec{y}\) can be written as a linear combination of the columns of \(A\). The entries of \(\vec{x}\) tell us what scalars the coefficients.

Thus, the column space of \(A\) and the range the linear transformation \(T\) are identical.

Example

Finally, let’s take a look at an example. Suppose the matrix \(M\) on the left has the reduced row echelon form on the right.

\[\begin{align*} M&=\begin{bmatrix} 2 & 1 & 7 & -7\\ -3 & 4 & -5 & -6\\ 1 & 1 & 4 & -5 \end{bmatrix} &R&=\begin{bmatrix} 1 & 0 & 3 & -2\\ 0 & 1 & 1 & -3\\ 0 & 0 & 0 & 0 \end{bmatrix}\text{.} \end{align*}\]

We’re going to find bases for both the null-space and the column space of \(M\).

Basis for the null-space

To find the null-space, we read the solutions of \(M\vec{x}=\vec{0}\) right off of the reduced row echelon form \(R\). We note that \(x_3\) and \(x_4\) are free and \[x_1 = 2x_4 - 3x_3 \text{ and } x_2 = 3x_4 - x_1.\] Thus, the general vector in the null-space can be written \[ \langle 2x_4 - 3x_3, 3x_4 - x_3, x_3, x_4 \rangle = \langle -3,-1,1,0 \rangle x_3 + \langle 2,3,0,1 \rangle x_4. \] Put another way, a basis for the null space consists of the vectors \[ \langle -3,-1,1,0 \rangle \text{ and } \langle 2,3,0,1 \rangle. \]

Basis for the column space

Let’s take a look again at our matrices \(M\) and \(R\): \[\begin{align*} M&=\begin{bmatrix} 2 & 1 & 7 & -7\\ -3 & 4 & -5 & -6\\ 1 & 1 & 4 & -5 \end{bmatrix} &R&=\begin{bmatrix} 1 & 0 & 3 & -2\\ 0 & 1 & 1 & -3\\ 0 & 0 & 0 & 0 \end{bmatrix}\text{.} \end{align*}\] The indices of the columns for which \(R\) has a leading 1 are called the pivots. In this case, the pivots are 1 and 2 (or 0 and 1, in Python). It turns out that the columns of \(M\) with those indices form a basis for the column space, and therefore the range. That is, \[ \langle 2,-3,1 \rangle \text{ and } \langle 1,4,1 \rangle \] form a basis for the range.

Why’s that??

We can solve \(M\vec{x} = \vec{b}\) using by putting an augmented matrix \([M|\vec{b}]\) into reduced row echelon form \([R|\vec{b}']\). If we set the free parameters there equal to zero, we obtain a solution \(\vec{x}\) where only the pivot indices are non-zero. This \(\vec{x}\) gives us the coefficients needed to express \(\vec{b}\) as a linear combination of the pivot columns of \(M\).