Vector Spaces - \(\mathbb R^n\) and More
Mon, Jan 27, 2025
\[ \newcommand{\vect}[1]{\mathbf{#1}} \newcommand{\colvector}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\vectorlist}[2]{\vec{#1}_{1},\,\vec{#1}_{2},\,\vec{#1}_{3},\,\ldots,\,\vec{#1}_{#2}} \newcommand{\spn}[1]{\left\langle#1\right\rangle} \newcommand{\setparts}[2]{\left\lbrace#1\,\middle|\,#2\right\rbrace} \newcommand{\lincombo}[3]{#1_{1}\vec{#2}_{1}+#1_{2}\vec{#2}_{2}+\cdots +#1_{#3}\vec{#2}_{#3}} \]
We’ve alluded to the idea of vectors a few times now. It’s time to clarify exactly what these things are.
We’ll work mostly in \(n\)-dimensional Euclidean space - a generalization of the three-dimensional space in which we live. We’ll finish, though, with an abstract description of the vector space concept in general.
The term Euclidean space is a generalization of the two and three dimensional geometry described by Euclid. The generalization is constructed using a coordinate system.
We often denote this by \(\mathbb R^n\) and represent it as a collection lists of real numbers of length \(n\) which we use to determine points in \(n\)-dimensional space.
Typically, these points are delimited by parentheses so that a point might be written as
\[(x_1,x_2,x_3,\ldots,x_n).\]
We now extend that to an algebraic structure. That is, we wish to be able to make sense out of operations like addition, subtraction, and multiplication.
In this context, we’ll often delimit the components of a vector with angled brackets and distinguish it in bold (\(\vect{u}\)) or with a hat (\(\vec{v}\), which is my preference). Thus, I might write:
\[\vec{v} = \langle v_1,v_2,v_3,\ldots,v_n\rangle.\]
Another common way to represent an \(n\)-dimensional vector is as an \(n\times1\) matrix. Thus, we might write
\[\vec{v} = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_n \end{bmatrix}.\]
There’s clearly a simple correspondence between the two representations. The column vector notation is particularly convenient in the context of linear algebra when we perform operations on them with matrices.
The angled bracket notation conserves space, though, and we will use both notations as we see fit.
However we choose to denote vectors, we’ll perform many algebraic operations componentwise, by which we mean we apply the operation to the individual terms. In the case of addition, for example, we would write:
\[\vec{u} + \vec{v} = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ \vdots \\ u_n \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_n \end{bmatrix} = \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \\ u_3 + v_3 \\ \vdots \\ u_n + v_n \end{bmatrix}. \]
In the case of scalar multiplication, we would write: \[\alpha\vec{u} = \alpha \langle u_1,u_2,u_3,\ldots,u_n \rangle = \langle \alpha u_1, \alpha u_2, \alpha u_3,\ldots, \alpha u_n \rangle.\]
To be clear, these are the definitions of these operations.
Oh, hey look - we’ve used both notations already!
The algebra of vectors in \(\mathbb R^n\) that we’ve defined follows many of the properties of \(\mathbb R\). For example:
Basic algebraic laws to know for exam!
For any \(\vec{u},\vec{v},\vec{w}\in\mathbb R^n\) and \(\alpha \in\mathbb R\), we have
These algebraic properties can all be proved by breaking them down to the component level, applying the corresponding rule at that level, and then reconstituting the vector formulation. This type of proof is called a componentwise proof.
For example, here’s a componentwise proof of the distributive law for scalar multiplication:
\[\begin{aligned} \alpha(\vec{u} + \vec{v}) &= \alpha\left(\langle u_1,u_2,\ldots,u_n \rangle + \langle v_1, v_2,\ldots,v_n \rangle\right) \\ &= \alpha\langle u_1+v_1,u_2+v_2,\ldots,u_n+v_n \rangle \\ &= \langle \alpha(u_1+v_1),\alpha(u_2+v_2),\ldots,\alpha(u_n+v_n) \rangle \\ &= \langle \alpha u_1+\alpha v_1,\alpha u_2+\alpha v_2,\ldots,\alpha u_n+\alpha v_n \rangle \\ &= \langle \alpha u_1, \alpha u_2,\ldots, \alpha u_n \rangle + \langle \alpha v_1, \alpha v_2,\ldots, \alpha v_n \rangle \\ &= \alpha \vec{u} + \alpha \vec{v}. \end{aligned}\]
Note that each of the six equals signs on the previous slide can be justified by some specific definition or property of the real number system. Those justifications are:
Note that the law we want to prove for vectors is applied for real numbers in step 4. For this reason, we might say that \(\mathbb R^n\) “inherits” many of the properties of \(\mathbb R\).
Scientific problems are often simplified by breaking them into constituent parts. The study of motion in space, for example, can be broken into motion the \(x\), \(y\), and \(z\) directions.
In data science, the results of an algorithmic prediction might be the combination of a number of factors. Predicted points scored by a team in an upcoming game, for example, might be the combination of average points scored by that team in the past together with defensive statistics for the opponent.
If our data is encoded in a vector space, this leads us to ask - how to best combine the components of that data to most efficiently represent the aspects we need?
Given \(m\) vectors \[\vec{v}_1, \vec{v}_2, \vec{v}_3, \ldots, \vec{v}_m\] from \(\mathbb R^n\), and \(m\) scalars \(\alpha_1,\alpha_2,\alpha_3,\ldots\alpha_m\), the corresponding linear combination is \[\sum_{i=1}^m \alpha_i\vec{v}_i = \alpha_1\vec{v}_1 + \alpha_2\vec{v}_2 + \alpha_3\vec{v}_3 + \cdots + \alpha_m\vec{v}_m.\]
If \(m=n=4\), \(\alpha_1=1,\alpha_2=-4,\alpha_3=2,\alpha_4=-1\), and
\[\vec{v_1} = \colvector{2\\4\\-3\\1}, \: \vec{v_2} = \colvector{6\\3\\0\\-2}, \: \vec{v_3} = \colvector{-5\\2\\1\\1}, \: \vec{v_4} = \colvector{3\\2\\-5\\7},\]
then our linear combination would be
\[ \small \colvector{2\\4\\-3\\1} -4\colvector{6\\3\\0\\-2}+ 2\colvector{-5\\2\\1\\1} -\colvector{3\\2\\-5\\7} =\colvector{2\\4\\-3\\1}+ \colvector{-24\\-12\\0\\8}+ \colvector{-10\\4\\2\\2}+ \colvector{-3\\-2\\5\\-7} =\colvector{-35\\-6\\4\\4} \]
Any linear system, like \[ \colvector{-7x_1 -6 x_2 - 12x_3\\ 5x_1 + 5x_2 + 7x_3\\ x_1 +4x_3} = \colvector{-33\\24\\5} \] can be written as a linear combination. For example: \[ x_1\colvector{-7\\5\\1}+ x_2\colvector{-6\\5\\0}+ x_3\colvector{-12\\7\\4} = \colvector{-33\\24\\5}\text{.} \]
The previous observation relating linear systems and linear combinations has a useful interpretation:
The vector \(\vec{x}\) is a solution of the system \[A\vec{x} = \vec{b}\] if and only if the vector \(\vec{b}\) can be expressed as a linear combination of the columns of \(A\) using the components of \(\vec{x}\) as the scalar multiples.
Definition Given a set of vectors \(S\) in \(\mathbb R^n\), their span \(\spn{S}\) is the set of all linear combinations of those vectors.
Written symbolically, if \(S=\{\vectorlist{u}{p}\}\), \[\begin{align*} \spn{S}&=\setparts{\lincombo{\alpha}{u}{p}}{\alpha_i\in\mathbb R,\,1\leq i\leq p}\\ &=\setparts{\sum_{i=1}^{p}\alpha_i\vec{u}_i}{\alpha_i\in\mathbb R,\,1\leq i\leq p}\text{.} \end{align*}\]
Note that the span of a set of vectors in \(\mathbb R^n\) is an example of a subspace of \(\mathbb R^n\).
We’ll get to that at the end of these notes as we discuss more general vector spaces.
We say that a set of vectors \(S=\{\vectorlist{u}{n}\}\), is linearly dependent if the equation \[ \lincombo{\alpha}{u}{n}=\vec{0} \] has a non-trivial solution. (That is, not all the \(\alpha_i\)s are zero.)
Otherwise, the set \(S\) is linearly independent.
The sets
are linearly independent in \(\mathbb R^2\).
The set \(\{\langle 1,0 \rangle, \langle 0,1 \rangle, \langle 1,1 \rangle\}\) is linearly dependent.
Given \(n\) vectors in \(\mathbb R^n\), there’s a simple algorithm to determine their linear dependence or independence.
Simply form the matrix whose columns are the given vectors and place it in reduced row echelon form. If the result has only zeros off the diagonal and only ones or zeros on the diagonal, then the vectors are linearly independent. Otherwise, the vectors are linearly dependent.
Consider three column vectors that form the matrix \(A\) shown on the left below. The reduced row echelon form \(R\) of \(A\) is shown on the right, as we can check using SymPy. Given the form of \(R\), the columns of \(A\) must be linearly independent.
\[A = \left[\begin{matrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\\1 & 0 & 1\end{matrix}\right]\]
\[R = \left[\begin{matrix}1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1\\0 & 0 & 0\end{matrix}\right]\]
Now, consider the three vectors that form the columns of the matrix \(B\) shown on the left below. (Note that the middle entry in the final row is different.) The reduced row echelon form \(E\) of \(B\) is shown on the right, as we can again check. Given the form of \(E\), the columns of \(B\) must be linearly dependent.
\[B = \left[\begin{matrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\\1 & 1 & 1\end{matrix}\right]\]
\[E = \left[\begin{matrix}1 & 0 & -1\\0 & 1 & 2\\0 & 0 & 0\\0 & 0 & 0\end{matrix}\right]\]
I guess this means that we could express the zero vector as a linear combination of those columns, as you’re asked to do in the last problem on the exam 1 review sheet…
Continuing with
\[B = \left[\begin{matrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\\1 & 1 & 1\end{matrix}\right]\]
\[E = \left[\begin{matrix}1 & 0 & -1\\0 & 1 & 2\\0 & 0 & 0\\0 & 0 & 0\end{matrix}\right]\]
We can read the solutions of \(B\vec{x}=\vec{0}\) from \(E\). We find that \(x_3\) is free and that \[ x_1 = x_3 \text{ and } x_2 = -2x_3. \] For example, \(\vec{x} = \langle 1,-2,1 \rangle\) is a solution. It’s easy to check that this linear combination of the columns yields \(\vec{0}\).
Recall the definition of the span \(\spn{S}\) of a set of vectors \(S\) as the set of all linear combinations of those vectors.
We say that \(S\) spans \(\mathbb R^n\) if \[ \spn{S} = \mathbb R^n.\] Such a set is called a spanning set.
I guess the concept of a spanning set is somewhat complementary to linear independence. The more vectors you have, the more likely they are to span the space but the less likely they are to be linearly independent.
The sweet spot is a linearly independent spanning set. This type of set is called a basis for \(\mathbb R^n\).
The vectors \(\vec{\imath} = \langle 1,0 \rangle\) and \(\vec{\jmath} = \langle 0,1 \rangle\) form a basis for \(\mathbb R^2\).
This example extends in a natural way to a basis in \(\mathbb R^n\):
\[ \begin{bmatrix} 1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}, \: \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}, \: \begin{bmatrix} 0 \\ 0 \\ 1 \\ \vdots \\ 0 \end{bmatrix}, \begin{array}{c} \cdots \\ \\ \cdots \\ \\ \cdots \end{array}, \: \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} \]
There are many more examples of bases for \(\mathbb R^n\). The set \[\langle 1,1 \rangle, \: \langle 1,-1 \rangle\] forms a basis for \(\mathbb R^2\).
Any set linearly independent set of \(n\) vectors in \(\mathbb R^n\) form a basis.
One key feature of mathematics that truly distinguishes it from the sciences is its use of abstraction and the axiomatic method.
In abstraction, we identify the key features in a concept that we need to make it useful. We do this systematically by writing down axioms that truly define the object. We then prove theorems based on those axioms without any reference to the specific objects themselves. Any theorems we prove must then be satisfied by any objects that satisfy those same axioms. As a result, our theorems become more broadly applicable.
The key facts about \(\mathbb R^n\) that allowed us to get started in the first place, were that it’s
Thus, the idea behind abstraction is to assume that we’ve got any collection of objects that allow us to do the same things. We’ve just got to write down what those key properties are.
A vector space over the field of real numbers \(\mathbb R\) is a non-empty set \(V\) equipped with
In addition, a list of 8 axioms must be satisfied.
The first five axioms must hold for all \(\vec{u},\vec{v},\vec{w}\in V\) and for all \(\alpha,\beta\in\mathbb R\).
The final three axioms assert the existence and/or behavior of some specific elements of \(V\) or \(\mathbb R\).
We can now be more precise about the idea of a subspace.
Given a vector space \(V\), a subspace \(W\) of \(V\) is a subset of \(V\) which itself satisfies the vector space axioms.
Simple Example
The linear span of the set containing \[\vec{\imath} = \langle 1,0,0 \rangle \text{ and } \vec{\jmath} = \langle 0,1,0 \rangle\] is a two-dimensional subspace of \(\mathbb R^3\).
If \(V\) is a vector space and \(S\subset V\), then the linear span \(W = \spn{S}\) is a subspace of \(V\).
The key issue is that \(W\) is closed under the vector space operations. For example if \(\vec{u},\vec{v}\in W\), then \[\vec{u} = \lincombo{\alpha}{u}{n} \text{ and } \vec{v} = \lincombo{\beta}{v}{m}\] for some choice of \(\alpha_i\text{s},\beta_j\text{s} \in \mathbb R\) and \(\vec{u}_i\text{s},\vec{v}_j\text{s}\in S\). As a result, \[\begin{aligned} \vec{u}+\vec{v} &= \lincombo{\alpha}{u}{n} \\ &+ \lincombo{\beta}{v}{m} \in W. \end{aligned}\]
Once we know this, the axioms are satisfied since everything already lives in a larger vector space.
Here are five vector spaces of real functions, each of which is a subspace of the next.
Thinking of functions in this fact is tremendously powerful. This is the basis of Fourier analysis, for example.
When we work at this level of abstraction, theorems that we prove become much more broadly applicable.
For example, we know that the quickest way to get to a wall in the room is to move perpendicularly to it. We can generalize this idea to the concept of orthogonality to help us minimize the distance to a subspace. Doing so will lead to more efficient ways to solve optimization problems.