Fri, Jan 23, 2026
Last time, we talked about linear systems and how they arise naturally in the process of the optimization of multivariable quadratic functions. We also talked about using algebraic operations on the equations can lead to solutions and what types of solution sets can and cannot arise. We finished by noting that the coefficients in a linear system can be naturally thought of as a rectangular array of numbers, called a matrix, and that we might be able to solve the system by working with the matrix directly.
Today, we’ll focus on matrices, row operations, and reduced row echelon form of a matrix. This will help us formalize the techniques we developed last time into definitive algorithms.
\[ \newcommand{\vect}[1]{\mathbf{#1}} \newcommand{\rowopswap}[2]{R_{#1}\leftrightarrow R_{#2}} \newcommand{\rowopmult}[2]{#1R_{#2}} \newcommand{\rowopadd}[3]{#1R_{#2}+R_{#3}} \newcommand\aug{\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!} \newcommand{\matrixentry}[2]{\left\lbrack#1\right\rbrack_{#2}} \]
Definition A matrix \(A\in\mathbb{R^{m\times n}}\) is simply an \(m\times n\) array of numbers:
\[ \begin{equation*} A= \begin{bmatrix} a_{11}&a_{12}&a_{13}&\dots&a_{1n}\\ a_{21}&a_{22}&a_{23}&\dots&a_{2n}\\ \vdots&\vdots&\vdots& &\vdots\\ a_{m1}&a_{m2}&a_{m3}&\dots&a_{mn}\\ \end{bmatrix}\text{.} \end{equation*} \]
Given a matrix \(A\), we sometimes write \([A]_{ij}\) to refer to the element in row \(i\) and column \(j\). In the abstract example above, we have \[[A]_{ij} = a_{ij}.\]
Here’s a matrix \(B\) with \(m=3\) rows and \(n=4\) columns. \[ B=\begin{bmatrix} -1&2&5&3\\ 1&0&-6&1\\ -4&2&2&-2 \end{bmatrix} \]
The subscript notation for entry extraction yields, for example, that \[\matrixentry{B}{2,3}=-6 \text{ and } \matrixentry{B}{3,4}=-2.\]
Given \(A\in\mathbb{R^{m\times n}}\), the transpose of \(A\) is the matrix \(A^{\mathsf T}\) defined by \[[A^{\mathsf T}]_{ij} = [A]_{ji}.\] The effect is to swap the rows and columns of \(A\). For example, \[ B=\begin{bmatrix} -1&2&5&3\\ 1&0&-6&1\\ -4&2&2&-2 \end{bmatrix} \implies B^{\mathsf T} = \begin{bmatrix} -1&1&-4 \\ 2&0&2 \\ 5&-6&2 \\ 3&1&-2 \end{bmatrix} \]
A column vector \(\vect{u}\) in \(\mathbb{R^m}\) is a vertical array of numbers or, more precisely an \(m\times 1\) matrix: \[ \vect{u} = \begin{bmatrix}u_1 \\ u_2 \\ \vdots \\ u_m\end{bmatrix}. \] In the interest of saving vertical space, I might sometimes write \[ \vect{u} = \begin{bmatrix}u_1 & u_2 & \cdots & u_m\end{bmatrix}^{\mathsf T}. \] When writing on the board, I typically distinguish vectors from scalars using a vector hat like \(\vec{u}\).
Sometimes, we might write an \(m\times n\) matrix as a row of \(n\) column vectors, each of size \(m\): \[ A = \begin{bmatrix} \mathbf{A}_1 & \mathbf{A}_2 & \cdots & \mathbf{A}_n. \end{bmatrix} \]
Let the matrix \(A \in \mathbb{R^{m\times n}}\) and the vector \(\vect{u} \in \mathbb{R^n}\) be defined by \[ \begin{aligned} A &= \begin{bmatrix} \mathbf{A}_1 & \mathbf{A}_2 & \cdots & \mathbf{A}_n. \end{bmatrix} \\ \vect{u} &= \begin{bmatrix} u_1 & u_2 & \cdots & u_n \end{bmatrix}^{\mathsf T} \end{aligned} \] Then the matrix-vector product \(A\vect{u}\) is defined by \[ A\vect{u} = u_1 \mathbf{A}_1 + u_2 \mathbf{A}_2 + \cdots + u_n \mathbf{A}_n \in \mathbb{R^m}. \] This operation of multiplying column vectors by scalars and adding up the results is often called a linear combination.
\[ \begin{aligned} \begin{bmatrix} 1 & -1 & 1 \\ 2 & 1 & 3 \end{bmatrix} \begin{bmatrix} 2 \\ -1 \\ 2 \end{bmatrix} &= 2\begin{bmatrix}1 \\ 2 \end{bmatrix} + (-1)\begin{bmatrix}-1 \\ 1 \end{bmatrix} + 2\begin{bmatrix}1 \\ 3 \end{bmatrix} \\ &= \begin{bmatrix} 2 + 1 + 2 \\ 4 - 1 + 6 \end{bmatrix} = \begin{bmatrix} 5 \\ 9 \end{bmatrix} \end{aligned} \]
If you’re familiar with the dot product, you might notice that the first entry of the result is the dot product of the first row with the column vector and the second entry of the result is the dot product of the second row with the column vector.
\[ \begin{aligned} \begin{bmatrix} a_{11}&a_{12}&\dots&a_{1n}\\ a_{21}&a_{22}&\dots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}&a_{m2}&\dots&a_{mn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots\\x_n \end{bmatrix} &= x_1 \begin{bmatrix} a_{11}\\ a_{21}\\ \vdots\\ a_{m1}\\ \end{bmatrix} + x_2 \begin{bmatrix} a_{12}\\ a_{22}\\ \vdots\\ a_{m2}\\ \end{bmatrix} +\cdots+ x_n \begin{bmatrix} a_{1n}\\ a_{2n}\\ \vdots\\ a_{mn}\\ \end{bmatrix} \\ &= \begin{bmatrix} a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n \\ a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n \\ \vdots \\ a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n \end{bmatrix} \end{aligned} \]
The matrix-vector equation \(A\vect{u} = \vect{b}\) given by \[ \begin{equation*} \begin{bmatrix} a_{11}&a_{12}&a_{13}&\dots&a_{1n}\\ a_{21}&a_{22}&a_{23}&\dots&a_{2n}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ a_{m1}&a_{m2}&a_{m3}&\dots&a_{mn}\\ \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots\\x_n \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \\\vdots \\ b_m \end{bmatrix}. \end{equation*} \] is equivalent to the system \[\begin{align*} a_{11}x_1+a_{12}x_2+a_{13}x_3+\dots+a_{1n}x_n&=b_1\\ a_{21}x_1+a_{22}x_2+a_{23}x_3+\dots+a_{2n}x_n&=b_2\\ &\vdots\\ a_{m1}x_1+a_{m2}x_2+a_{m3}x_3+\dots+a_{mn}x_n&=b_m. \end{align*}\]
When working with the system \(A\vect{x}=\vect{b}\), it often makes sense to drop the variable \(\vect{x}\) and work with the so-called augmented matrix \([A|\vect{b}]\). In expanded form:
\[ \left[\begin{matrix} a_{11}&a_{12}&a_{13}&\dots&a_{1n}\\ a_{21}&a_{22}&a_{23}&\dots&a_{2n}\\ \vdots&\vdots&\vdots& &\vdots\\ a_{m1}&a_{m2}&a_{m3}&\dots&a_{mn}\\ \end{matrix}\right| \left.\begin{matrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{matrix}\right] \]
We can then do row operations on the augmented matrix and read off solutions from the result.
We will sometimes drop the separation bar when the intent is understood.
The equation operations on systems translate to analogous row operations for matrices.
The system on the left can be expressed as the augmented matrix on the right. We can then use row operations to place that matrix in upper triangular form.
\[\begin{align*} x_1+2x_2+2x_3&=4\\ x_1+3x_2+3x_3&=5\\ 2x_1+6x_2+5x_3&=6 \end{align*}\]
\[\begin{align*} A=\begin{bmatrix} 1&2&2&4\\ 1&3&3&5\\ 2&6&5&6 \end{bmatrix} \end{align*}\]
\[ \begin{align*} \xrightarrow{\rowopadd{-1}{1}{2}} & \begin{bmatrix} 1&2&2&4\\ 0&1&1&1\\ 2&6&5&6 \end{bmatrix} \xrightarrow{\rowopadd{-2}{1}{3}} \begin{bmatrix} 1&2&2&4\\ 0&1&1&1\\ 0&2&1&-2 \end{bmatrix}\\ \xrightarrow{\rowopadd{-2}{2}{3}} & \begin{bmatrix} 1&2&2&4\\ 0&1&1&1\\ 0&0&-1&-4 \end{bmatrix} \xrightarrow{\rowopmult{-1}{3}} \begin{bmatrix} 1&2&2&4\\ 0&1& 1&1\\ 0&0&1&4 \end{bmatrix}\text{.} \end{align*} \]
We played with this exact system last time and saw that it can be solved via back-substitution. Here’s another approach to solving the system that leads to the so-called reduced row echelon form: \[ \begin{aligned} \begin{bmatrix} 1&2&2&4\\ 0&1& 1&1\\ 0&0&1&4 \end{bmatrix} &\xrightarrow{\rowopadd{-1}{3}{2}} \begin{bmatrix} 1&2&2&4\\ 0&1&0&-3\\ 0&0&1&4 \end{bmatrix} \xrightarrow{\rowopadd{-2}{3}{1}} \begin{bmatrix} 1&2&0&-4\\ 0&1&0&-3\\ 0&0&1&4 \end{bmatrix} \\ &\xrightarrow{\rowopadd{-2}{2}{1}} \begin{bmatrix} 1&0&0&2\\ 0&1&0&-3\\ 0&0&1&4 \end{bmatrix} \end{aligned} \]
From here, it’s totally easy to see that the solution vector can be written \[ \vect{x} = \begin{bmatrix} 2 & -3 & 4 \end{bmatrix}^{\mathsf T}. \]
Any matrix can be placed into a particular canonical form, called the reduced row-echelon form or RREF, that makes it easy to analyze in a number of ways. A matrix is RREF if it meets all of the following conditions:
Any matrix can be placed into RREF by a sequence of row operations and those row operations preserve important properties of the matrix.
The leading ones in a RREF matrix are often called the pivots.
Here’s a matrix in reduced row echelon form:
\[\begin{bmatrix} 1&-3&0&6&0&0&-5&9\\ 0&0&0&0&1&0&3&-7\\ 0&0&0&0&0&1&7&3\\ 0&0&0&0&0&0&0&0\\ 0&0&0&0&0&0&0&0 \end{bmatrix}\]
If this is the coefficient matrix of a homogeneous system, then we could write that system as
\[\begin{aligned} x_1 - 3 x_2 + 0x_3 + 6 x_4 + 0x_5 + 0x_6 - 5x_7 + 9 x_8 &= 0 \\ x_5 + 0x_6 + 3x_7 - 7 x_8 &= 0 \\ x_6 + 7x_7 + 3x_8 &= 0. \end{aligned}\]
Continuing with the previous system,
\[\begin{aligned} x_1 - 3 x_2 + 0x_3 + 6 x_4 + 0x_5 + 0x_6 - 5x_7 + 9 x_8 &= 0 \\ x_5 + 0x_6 + 3x_7 - 7 x_8 &= 0 \\ x_6 + 7x_7 + 3x_8 &= 0, \end{aligned}\]
note that the pivot variables are \(x_1\), \(x_5\), and \(x_6\). The other five variables are free and the pivot variables can be expressed in terms of those free variables:
\[\begin{aligned} x_6 &= -(7x_7 + 3x_8) \\ x_5 &= -(3x_7 - 7x_8) \\ x_1 &= 3x_2 - 6x_4 + 5x_7 - 9x_8. \end{aligned}\]
Once we’ve expressed the pivot variables in terms of the free variables as \[\begin{aligned} x_6 &= -(7x_7 + 3x_8) \\ x_5 &= -(3x_7 - 7x_8) \\ x_1 &= 3x_2 - 6x_4 + 5x_7 - 9x_8, \end{aligned}\]
we can identify the five free variables with parameters \(r\), \(s\), \(t\), \(u\), and \(v\):
| r | s | t | u | v |
|---|---|---|---|---|
| \(x_2\) | \(x_3\) | \(x_4\) | \(x_7\) | \(x_8\) |
That allows us to explicitly write out the five dimensional solution space:
\[(3r-6t+5u-9v,\, r,\, s,\, t,\, 7v-3u,\, -(7u+3v),\, u,\, v).\]
Let’s formalize our tools a bit. Starting with a definition:
Definition of row-equivalence: Two matrices are called row-equivalent if one can be obtained from the other by a sequence of row operations.
If \(A\) is row-equivalent to \(B\), I might write \(A \sim B\). Note that if \(A\), \(B\), and \(C\) are matrices, then
It turns out that any matrix is row-equivalent to another matrix in RREF.
Theorem: Existence of RREF Suppose \(A\) is a matrix. Then there is a matrix \(B\) so that
As a result, we can determine the solution set of any linear system.
Theorem: Uniqueness of RREF: Suppose that \(A\) is a matrix and that \(B\) and \(C\) are matrices that are row-equivalent to \(A\) and in reduced row-echelon form. Then \(B=C\).
Existence/uniqueness theorems are the bomb!
From the applied perspective, the proof of these theorems is the existence of an algorithm. We’ll check that out on the next slide but, first, here’s the input:
r = 0 # Set current row
for j in range(n): # Iterate along the columns
i = r # Set sub row counter
while i < m and abs(A[i, j]) < tol:
i = i+1 # Skip to first non-zero term
if i < m: # If we're not done with the column
A[[r, i]] = A[[i, r]] # Swap the rows
A[r] = A[r]/A[r, j] # Scale the pivot
for k in range(m): # Zero out terms above the pivot
if k != r:
A[k] = A[k] - A[k, j] * A[r]
r = r+1 # Increment current row
print(A)[[ 1. 0. 0. 2.]
[ 0. 1. 0. -3.]
[-0. -0. 1. 4.]]
There are a few tools that you can use to assist you in Gauss-Jordan computations. These include,
Here’s a good example to try by hand:
\[ A = \begin{bmatrix} 1 & 1 & 3 & 1 \\ 0 & 1 & 1 & 0 \\ 2 & 1 & 5 & 2 \end{bmatrix} \]
Assuming this matrix is the augmented matrix of a system, you should be able to:
The reduced row echelon form is \[ \begin{bmatrix} 1 & 0 & 2 & 1 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix} \]
This yields reduced system \[x_1 + 2x_3 = 1 \text{ and } x_2+x_3 = 0.\]
Thus, the solution set is
\[\{(1-2t, -t, t): t\in \mathbb R\}.\]
The \(n\times n\) systems are often of particular importance. Such a matrix is called square.
Generally, we expect a square system to have a unique solution and it would be nice to have a way to determine this.
Let’s consider a homogenous \(n\times n\) system \(A\vect{x}=\vect{0}\). If the system has a unique solution, there can be no zero rows of its reduced row echelon form. Thus, we must have \[ a_{ij} = \begin{cases} 1 & i=j \\ 0 & \text{else}.\end{cases} \] We might write the expanded form as \[ \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix}. \]
A square matrix \(A\) is said to be nonsingular if the equation \(A\vect{x} = \vect{0}\) has only the trivial solution \(\vect{x}=\vect{0}\).
The previous slide gives us a simple way to check if a matrix is non-singular: simply compute its RREF. If you get a square matrix with ones on the diagonal and zeros off the diagonal, then the original matrix is non singular.
A matrix that is not singular is called nonsingular.
The concept of singularity is of fundamental importance. We will state several more characterizations of this concept over the next couple of weeks.