Linear Transformations - via Matrices and More
Mon, Feb 03, 2025
In Calculus, we develop a slew of tools to study a particular class of functions - namely real functions. These are functions \(f:\mathbb R\to \mathbb R\) that satisfy certain properties, like continutity, differentiability, and/or integrability.
In Linear Algebra, the most important types of functions we study are linear transformations, i.e. functions mapping one vector space to another that satisfy certain axioms.
Today, we’ll jump into the study of linear transformations. We’ll begin with their abstract definition, then focus on transformations defined by matrix multiplication.
\[ \newcommand{\matrixentry}[2]{\left\lbrack#1\right\rbrack_{#2}} \]
Definition
A linear transformation \(T\) is a function mapping a vector space \(U\) to a vector space \(V\) that satisfies \[T(\alpha \vec{u} + \beta\vec{v}) = \alpha T(\vec{u}) + \beta T(\vec{v})\] for all \(\vec{u}\in U\), \(\vec{v} \in V\), and \(\alpha,\beta\in\mathbb R\).
Consider the one-dimensional vector space \(\mathbb R^1\). (Or just \(\mathbb R\), I suppose.) Let \(a\) be a real number and define \(T\) by \[T(x) = ax.\]
Then \(T\) is a linear transformation \(T:\mathbb R \to \mathbb R\).
To prove this, simply note that \[\begin{aligned} T(\alpha x + \beta y) &= a(\alpha x + \beta y) \\ &= \alpha \, ax + \beta \, ay = \alpha T(x) + \beta T(y). \end{aligned}\]
Note that \(f(x) = ax+b\) is not a linear transformation.
Let \(V\) denote the vector space of all polynomials and let \(T\) denote the differentiation operator. Then \(T\) is a linear transformation mapping \(V\to V\).
To prove this, simply note that \[\frac{d}{dx}(\alpha p(x) + \beta q(x)) = \alpha p'(x) + \beta q'(x).\]
Question: What if \(V\) consists of the larger class of functions that are just assumed to be differentiable?
Let \(M\) be an \(m\times n\) matrix and for \(\vec{u}\in\mathbb R^n\), define \(T(\vec{u})\) by \[T(\vec{u}) = M\vec{u}.\] Then T defines a linear transformation mapping \(\mathbb R^n\to\mathbb R^m\).
Well, I guess we better define matrix multiplication!
Our first main objective today is to define the algebraic operations on matrices, i.e.
This will all be done componentwise, so we’ll first specify some useful notation for operations that reference components.
The notation \([A]_{ij}\) will refer to the entry in row \(i\) and column \(j\) and \(\mathbf{A}_i\) will refer to the \(i^{\text{th}}\) column. For example, if \[ B=\begin{bmatrix} -1&2&5&3\\ 1&0&-6&1\\ -4&2&2&-2 \end{bmatrix}, \] Then \([B]_{32} = 2\) and \[ \mathbf{B}_3 = \begin{bmatrix} 5\\-6\\2 \end{bmatrix}. \]
Matrix addition is defined in the simplest possible componentwise manner. That is, if \(A\) and \(B\) are \(m\times n\) matrices, then \(A+B\) is the matrix satisfying \[ [A+B]_{ij} = [A]_{ij} + [B]_{ij} \] for all \(i,j\) satisfying \(1\leq i \leq m\) and \(1\leq j \leq n\). For example, \[ \begin{bmatrix} 2&-3&4\\ 1&0&-7 \end{bmatrix} + \begin{bmatrix} 6&2&-4\\ 3&5&2 \end{bmatrix} = \begin{bmatrix} 8&-1&0\\ 4&5&-5 \end{bmatrix}. \]
Scalar multiplication is also defined in the simplest possible componentwise manner. If \(A\) is a matrix and \(\alpha \in \mathbb R\), then \(\alpha A\) is the matrix satisfying \[ [\alpha A]_{ij} = \alpha [A]_{ij}. \] For example, \[ 2\begin{bmatrix} 2&-3&4\\ 1&0&-7 \end{bmatrix} = \begin{bmatrix} 4&-6&8\\ 2&0&-14 \end{bmatrix}. \]
It’s worth pointing out that the collection of all \(m\times n\) matrices with these properties satisfies all the properties necessary to form a vector space over the real numbers. Given matrices \(A\), \(B\), and \(C\), for example, and scalars \(\alpha\) and \(\beta\), we have statements like
It’s not hard to prove the vector space properties and doing so provides a nice illustration of the power of the component notation. Here’s a proof of the distributive law of scalar multiplication over matrix addition, for example.
Claim: Given \(m\times n\) matrices \(A\) and \(B\) and a scalar \(\alpha\), \[\alpha (A+B) = \alpha A + \alpha B.\]
Proof:
\[ \begin{align*} [\alpha(A+B)]_{ij} &= \alpha [A+B]_{ij} && \text{Def scalar multiplication} \\ &= \alpha ([A]_{ij} + [B]_{ij}) && \text{Def matrix addition} \\ &= \alpha [A]_{ij} + \alpha [B]_{ij} && \text{Real dist} \\ &= [\alpha A]_{ij} + [\alpha B]_{ij} && \text{Def scalar multiplication} \\ &= [\alpha A + \alpha B]_{ij} && \text{Def matrix addition} \end{align*} \]
We now prepare to define matrix multiplication. The definition is again componentwise but it’s more complicated than your first guess might be.
Ultimately, matrix multiplication is used to describe the general linear transformation mapping \(\mathbb R^n \to \mathbb R^m\) and it is that objective that drives the definition.
Supose that \(A\) is an \(m\times n\) matrix and that \(B\) is an \(n\times p\) matrix. The matrix product \(AB\) is then an \(m\times p\) matrix whose entries are \[ \matrixentry{AB}{ij} = \sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B}{kj}\text{.} \] In words, the entry in row \(i\) and column \(j\) of \(AB\) is obtained by multiplying the \(i^{\text{th}}\) row of \(A\) with the \(j^{\text{th}}\) column of \(B\) componentwise and adding the results.
You might recognize that operation as the dot product of the \(i^{\text{th}}\) row of \(A\) with the \(j^{\text{th}}\) column of \(B\). We’ll get to the dot product soon enough!
It’s not so hard once you do a few! Here’s an example:
\[ \left[\begin{matrix}-1 & 1 & -1\\-1 & -3 & -3\end{matrix}\right] \left[\begin{matrix}1 & 0 & -2 & 3\\-1 & -1 & 1 & 1\\-1 & 0 & 3 & 3\end{matrix}\right] = \left[\begin{matrix}-1 & -1 & 0 & -5\\5 & 3 & -10 & -15\end{matrix}\right] \]
You can see how it’s crucial that the number of columns of \(A\) equals the number of rows of \(B\).
\[ \begin{bmatrix} 2 & -1 \\ -3 & 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 2x_1 - x_2 \\ -3x_1 + 2x_2 \end{bmatrix} \]
Thus, we can represent systems using a compact matrix multiplication.
If \(A\) is \(m\times n\) and \(B\) is \(n\times 1\), then we might think of \(B\) as a column vector in \(\mathbb R^n\) and we might even denote it with a \(\vec{v}\).
Note then that \(A\vec{v}\) is an \(m\times 1\) column vector thus, the function \[\vec{v} \to A\vec{v}\] maps \(\mathbb R^n \to \mathbb R^m\).
As we’ll show, this function is, in fact, a linear transformation. Furthermore, any linear transformation from \(\mathbb R^n\) to \(\mathbb R^m\) can be represented in this fashion.
Here’s a fun example:
\[ \left[\begin{matrix}2 & 5 & 4 & 2\\4 & -5 & 1 & -3\\0 & -4 & -4 & 1\end{matrix}\right] \left[\begin{matrix}0\\0\\1\\0\end{matrix}\right] = \left[\begin{matrix}4\\1\\-4\end{matrix}\right] \]
This illustrates the fact that the product of a matrix with one of the standard coordinate basis vectors extracts a column from the matrix.
Our next goal will be to establish three major algebraic properties of matrix multiplication that are essential to understand how matrix multiplication defines a linear transformation. Namely,
The Distributive Law: If \(A\) is \(m\times n\) and \(B\) and \(C\) are both \(n\times p\), then
\[A(B+C) = AB + AC.\]
Compatibility of matrix and scalar multiplication: If \(A\) is \(m\times n\), \(B\) is \(n\times p\), and \(\alpha\in\mathbb R\), then \[A(\alpha B) = \alpha (AB).\]
The Associative Law: If \(A\) is \(m\times n\) and \(B\) is \(n\times p\), and \(C\) is \(p\times s\), then
\[(AB)C = A(BC).\]
Again, a major objective is to study functions mapping \(\mathbb R^n \to \mathbb R^m\) defined by \(T(\vec{u}) = A\vec{u}\), where \(A\) is an \(m\times n\) matrix. In particular, we would like to show that \(T\) is a linear transformation.
Well, if we specialize the distributive law to the case that \(B\) and \(C\) are \(m\times 1\) column vectors, say \(\vec{u}\) and \(\vec{v}\), then it becomes \[A(\vec{u} + \vec{v}) = A\vec{u} + A\vec{v}.\]
The compatibility property becomes \[A(\alpha \vec{u}) = \alpha (A\vec{u}).\]
These are exactly what we need to show that matrix multiplication defines a linear transformation.
If we specialize the associative law so that \(A\) is \(m\times n\) and \(B\) is \(n\times p\), and \(C=\vec{u}\) is \(p\times 1\), then it becomes \[(AB)\vec{u} = A(B\vec{u}).\]
Now suppose that \(T_1\) and \(T_2\) are the linear transformations defined by \[T_1(\vec{u}) = A\vec{u} \text{ and } T_2(\vec{u}) = B\vec{u}.\]
Then, we should have \(T_1\circ T_2(\vec{u}) = T_1(T_2(\vec{u}))\).
Note that the associative law verifies this. Put another way, the associative law implies that the product \(AB\) is the matrix that represents the linear transformation \(T_1\circ T_2\).
In this column of slides, we record the proofs of the three laws of
Again, these laws taken together show that matrix multiplication of an \(m\times n\) matrix with an \(n\times 1\) column vector defines a linear transformation.
We’ll show that \(A(\alpha B) = \alpha (AB)\) by showing that \[[A(\alpha B)]_{ij} = [\alpha (AB)]_{ij}.\] In words, we show that the entries are all equal. We do this like so:
\[\begin{align*} [A(\alpha B)]_{ij} &= \sum_{k=1}^n [A]_{ik}[\alpha B]_{kj} && \text{Def Matrix Mult} \\ &= \sum_{k=1}^n [A]_{ik}\alpha [B]_{kj} && \text{Real Dist Property} \\ &= \alpha \sum_{k=1}^n [A]_{ik}[B]_{kj} && \text{Real Dist Property} \\ &= \alpha [AB]_{ij} && \text{Def Matrix Mult} \end{align*}\]
\[\begin{align*} \matrixentry{A(B+C)}{ij}&=\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B+C}{kj}&& \text{Def Matrix Mult} \\ &=\sum_{k=1}^{n}\matrixentry{A}{ik}(\matrixentry{B}{kj}+\matrixentry{C}{kj})&& \text{Def Matrix Addition} \\ &=\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B}{kj}+\matrixentry{A}{ik}\matrixentry{C}{kj}&& \text{Real Dist Property} \\ &=\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B}{kj}+\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{C}{kj}&& \text{Real Comm Property} \\ &=\matrixentry{AB}{ij}+\matrixentry{AC}{ij}&& \text{Def Matrix Mult} \\ &=\matrixentry{AB+AC}{ij}&& \text{Def Matrix Addition} \\ \end{align*}\]
\[\begin{align*} \matrixentry{A(BD)}{ij}&=\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{BD}{kj} =\sum_{k=1}^{n}\matrixentry{A}{ik}\left(\sum_{\ell=1}^{p}\matrixentry{B}{k\ell}\matrixentry{D}{\ell j}\right) && \text{Mat Mul }\times 2 \\ &=\sum_{k=1}^{n}\sum_{\ell=1}^{p}\matrixentry{A}{ik}\matrixentry{B}{k\ell}\matrixentry{D}{\ell j} =\sum_{\ell=1}^{p}\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B}{k\ell}\matrixentry{D}{\ell j} && \text{Dist & Comm} \\ &=\sum_{\ell=1}^{p}\matrixentry{D}{\ell j}\left(\sum_{k=1}^{n}\matrixentry{A}{ik}\matrixentry{B}{k\ell}\right) = \sum_{\ell=1}^{p}\matrixentry{D}{\ell j}\matrixentry{AB}{i\ell} && \text{Dist & Mat Mul} \\ &=\sum_{\ell=1}^{p}\matrixentry{AB}{i\ell}\matrixentry{D}{\ell j} = \matrixentry{(AB)D}{ij}&& \text{Comm & Mat Mul} \end{align*}\]
We go through this tedium for a couple of reasons:
To illustrate the second point, note that matrix multipliction is not commutative. In fact, if \(A\) is \(m\times n\) and \(B\) is \(n\times p\), then \(AB\) and \(BA\) are both defined only when \(m=p\).
Even when \(A\) and \(B\) are both \(2\times2\) matrices, we need not have \(AB=BA\). I invite you to search for examples!
We now turn the representation question around. That is, given a linear transformation \(T:\mathbb R^n \to \mathbb R^m\), we wonder if we can find a matrix \(A\) such that \[ T(\vec{v}) = A\vec{v}. \] In this context, we think of \(T\) an oracle that will tell us the value of \(T(\vec{x})\) for an \(\vec{x}\in \mathbb R^n\).
The oracle, though, is a black box; we have no other information on \(T\).
Recall this example illustrating that the product of a matrix with one of the standard coordinate basis vectors extracts a column from the matrix. This turns out to be key so let’s generalize it.
Let \(\vec{e}_j = \langle e_1,e_2,e_3,\cdots,e_n \rangle\) be the vector whose \(i^{\text{th}}\) component is defined by \[ e_i = \begin{cases}1 & i = j \\ 0 & i\neq j\end{cases}. \] Now, if \(A\) is an \(m\times n\) matrix, note that \[ A\vec{e}_j = [\mathbf{A}]_j. \] That is \(\vec{e}_j\) is exactly the \(j^{\text{th}}\) column of \(A\).
Now, if we want \(A\) to model \(T\), we’ll choose the \(j^{\text{th}}\) column of \(A\) to be \(T(\vec{e}_j)\)
If, for example,
\[\begin{align*} T(\langle 1,0 \rangle) &= \langle 1,2,3 \rangle \text{ and } \\ T(\langle 0,1 \rangle) &= \langle 1,-1,1 \rangle, \end{align*}\]
then choose \(A\) to be
\[\begin{bmatrix} 1 & 1 \\ 2 & -1 \\ 3 & 1 \end{bmatrix}.\]
At this point, we’ve got \(A\) chosen so that \[ A\vec{e}_j = T(\vec{e}_j) \] for every \(j\). If \(\vec{v}\) is any vector in \(\mathbb R^b\), then we can write \[\vec{v} = \alpha_1 \vec{e}_1 + \alpha_2 \vec{e}_2 + \cdots + \alpha_n \vec{e}_n.\]
Thus,
\[\begin{align*} A\vec{v} &= A(\alpha_1 \vec{e}_1 + \alpha_2 \vec{e}_2 + \cdots + \alpha_n \vec{e}_n) \\ &= \alpha_1 A\vec{e}_1 + \alpha_2 A\vec{e}_2 + \cdots + \alpha_n A\vec{e}_n \\ &= \alpha_1 T(\vec{e}_1) + \alpha_2 T(\vec{e}_2) + \cdots + \alpha_n T(\vec{e}_n) \\ &= T(\alpha_1 \vec{e}_1 + \alpha_2 \vec{e}_2 + \cdots + \alpha_n \vec{e}_n) = T(\vec{v}) \end{align*}\]
We’re now going to introduce a couple of subspaces associated with a linear transformation \(T:U\to V\). These are
Throughout this column of slides, \(U\) and \(V\) are vector spaces and \(T\) is a linear transformation mapping \(U\to V\).
The null-space of \(T\) is the set of all vectors mapping to the zero element of \(V\). We often write the null-space as \({\cal N}(T)\) Symbolically. \[{\cal N}(T) = \{\vec{u}\in U: T(\vec{u}) = \vec{0}\}.\]
Since \({\cal N}(T)\subset V\), operations with its elements automatically satisfy the vector space properties. Thus, to show that \({\cal N}(T)\) is a subspace, we need only show that it’s algebraically closed under linear combinations.
To do this, suppose that \(\vec{u},\vec{v}\in {\cal N}(T)\). Then \[T(\alpha \vec{u} + \beta\vec{v}) = \alpha T(\vec{u}) + \beta T(\vec{v}) = \alpha \vec{0} + \beta \vec{0} = \vec{0}.\]
The range is exactly the functional range that you learn of in calculus. That is, \[\text{range}(T) = \left\{\vec{v}\in V: \text{there is an } \vec{u}\in U\text{ with } T(\vec{u}) = \vec{v}\right\}.\]
It’s not hard to show that the range is a subspace of \(V\). To do so, suppose that \(\vec{y}_1\) and \(\vec{y}_2\) are in the the range of \(T\), then there are \(\vec{x}_1,\vec{x}_2\in U\) such that \[T(\vec{x}_1) = \vec{y}_1 \text{ and } T(\vec{x}_2) = \vec{y}_2.\]
Thus,
\[\begin{align*} \alpha \vec{y}_1 + \beta \vec{y}_2 &= \alpha T(\vec{x}_1) + \beta T(\vec{x}_2) \\ &= T(\alpha \vec{x}_1 + \beta \vec{x}_2). \end{align*}\]
Suppose that \(U\) and \(V\) both represent the vector space of polynomials of degree at most 2. Suppose, also, that \(D:U\to V\) represents differentiation, which we know to be a linear transformation.
Then, the range of \(D\) is exactly the subspace of \(V\) consisting of polynomials of degree at most 1.
The null-space of \(D\) is the subspace of \(U\) that consists of all constants.
We now specialize the previous column of slide to the Euclidean case. Thus, in this column of slides, we have a linear transformation \(T:\mathbb R^n \to \mathbb R^m\).
The null-space and range are still defined in the same way. Now, though, there is an \(m\times n\) matrix \(A\) such that \[T(\vec{x}) = A\vec{x}.\] As it turns out, null-space and range can be expressed in terms of \(A\) and computed efficiently using the reduced row echelon form of \(A\).
Expressed in terms of \(A\), the null-space is the set of all \(x\in\mathbb R^n\) such that \[A\vec{x}=\vec{0}.\] Of course, we’ve spent some time solving exactly this kind of system so those techniques help us find the null-space of \(T\).
In particular, we can easily find the null-space, once \(A\) is in reduced row echelon form. It turns out that the reduced row echelon form helps us find the range efficiently as well.
The column space of the \(m\times n\) matrix \(A\) is defined to be the linear span of the columns of \(A\). Thus, it’s immediately seen to be a subspace of \(\mathbb R^m\).
Recall, though, that the system \(A\vec{x} = \vec{y}\) has a solution precisely when the vector \(\vec{y}\) can be written as a linear combination of the columns of \(A\). The entries of \(\vec{x}\) tell us what scalars the coefficients.
Thus, the column space of \(A\) and the range the linear transformation \(T\) are identical.
Finally, let’s take a look at an example. Suppose the matrix \(M\) on the left has the reduced row echelon form on the right.
\[\begin{align*} M&=\begin{bmatrix} 2 & 1 & 7 & -7\\ -3 & 4 & -5 & -6\\ 1 & 1 & 4 & -5 \end{bmatrix} &R&=\begin{bmatrix} 1 & 0 & 3 & -2\\ 0 & 1 & 1 & -3\\ 0 & 0 & 0 & 0 \end{bmatrix}\text{.} \end{align*}\]
We’re going to find bases for both the null-space and the column space of \(M\).
To find the null-space, we read the solutions of \(M\vec{x}=\vec{0}\) right off of the reduced row echelon form \(R\). We note that \(x_3\) and \(x_4\) are free and \[x_1 = 2x_4 - 3x_3 \text{ and } x_2 = 3x_4 - x_1.\] Thus, the general vector in the null-space can be written \[ \langle 2x_4 - 3x_3, 3x_4 - x_3, x_3, x_4 \rangle = \langle -3,-1,1,0 \rangle x_3 + \langle 2,3,0,1 \rangle x_4. \] Put another way, a basis for the null space consists of the vectors \[ \langle -3,-1,1,0 \rangle \text{ and } \langle 2,3,0,1 \rangle. \]
Let’s take a look again at our matrices \(M\) and \(R\): \[\begin{align*} M&=\begin{bmatrix} 2 & 1 & 7 & -7\\ -3 & 4 & -5 & -6\\ 1 & 1 & 4 & -5 \end{bmatrix} &R&=\begin{bmatrix} 1 & 0 & 3 & -2\\ 0 & 1 & 1 & -3\\ 0 & 0 & 0 & 0 \end{bmatrix}\text{.} \end{align*}\] The indices of the columns for which \(R\) has a leading 1 are called the pivots. In this case, the pivots are 1 and 2 (or 0 and 1, in Python). It turns out that the columns of \(M\) with those indices form a basis for the column space, and therefore the range. That is, \[ \langle 2,-3,1 \rangle \text{ and } \langle 1,4,1 \rangle \] form a basis for the range.
We can solve \(M\vec{x} = \vec{b}\) using by putting an augmented matrix \([M|\vec{b}]\) into reduced row echelon form \([R|\vec{b}']\). If we set the free parameters there equal to zero, we obtain a solution \(\vec{x}\) where only the pivot indices are non-zero. This \(\vec{x}\) gives us the coefficients needed to express \(\vec{b}\) as a linear combination of the pivot columns of \(M\).