Linear Algebra 2

Vector Spaces - \(\mathbb R^n\) and More

Portions copyright Rob Beezer (GFDL)

Mon, Jan 27, 2025

What are vectors?

\[ \newcommand{\vect}[1]{\mathbf{#1}} \newcommand{\colvector}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\vectorlist}[2]{\vec{#1}_{1},\,\vec{#1}_{2},\,\vec{#1}_{3},\,\ldots,\,\vec{#1}_{#2}} \newcommand{\spn}[1]{\left\langle#1\right\rangle} \newcommand{\setparts}[2]{\left\lbrace#1\,\middle|\,#2\right\rbrace} \newcommand{\lincombo}[3]{#1_{1}\vec{#2}_{1}+#1_{2}\vec{#2}_{2}+\cdots +#1_{#3}\vec{#2}_{#3}} \]

We’ve alluded to the idea of vectors a few times now. It’s time to clarify exactly what these things are.

We’ll work mostly in \(n\)-dimensional Euclidean space - a generalization of the three-dimensional space in which we live. We’ll finish, though, with an abstract description of the vector space concept in general.

Euclidean space

The term Euclidean space is a generalization of the two and three dimensional geometry described by Euclid. The generalization is constructed using a coordinate system.

We often denote this by \(\mathbb R^n\) and represent it as a collection lists of real numbers of length \(n\) which we use to determine points in \(n\)-dimensional space.

Typically, these points are delimited by parentheses so that a point might be written as

\[(x_1,x_2,x_3,\ldots,x_n).\]

\(\mathbb R^n\) as a vector space

We now extend that to an algebraic structure. That is, we wish to be able to make sense out of operations like addition, subtraction, and multiplication.

In this context, we’ll often delimit the components of a vector with angled brackets and distinguish it in bold (\(\vect{u}\)) or with a hat (\(\vec{v}\), which is my preference). Thus, I might write:

\[\vec{v} = \langle v_1,v_2,v_3,\ldots,v_n\rangle.\]

Column vectors

Another common way to represent an \(n\)-dimensional vector is as an \(n\times1\) matrix. Thus, we might write

\[\vec{v} = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_n \end{bmatrix}.\]

There’s clearly a simple correspondence between the two representations. The column vector notation is particularly convenient in the context of linear algebra when we perform operations on them with matrices.

The angled bracket notation conserves space, though, and we will use both notations as we see fit.

Vector addition

However we choose to denote vectors, we’ll perform many algebraic operations componentwise, by which we mean we apply the operation to the individual terms. In the case of addition, for example, we would write:

\[\vec{u} + \vec{v} = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ \vdots \\ u_n \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_n \end{bmatrix} = \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \\ u_3 + v_3 \\ \vdots \\ u_n + v_n \end{bmatrix}. \]

Scalar multiplication

In the case of scalar multiplication, we would write: \[\alpha\vec{u} = \alpha \langle u_1,u_2,u_3,\ldots,u_n \rangle = \langle \alpha u_1, \alpha u_2, \alpha u_3,\ldots, \alpha u_n \rangle.\]

To be clear, these are the definitions of these operations.

Oh, hey look - we’ve used both notations already!

Algebraic properties of \(\mathbb R^n\)

The algebra of vectors in \(\mathbb R^n\) that we’ve defined follows many of the properties of \(\mathbb R\). For example:

Basic algebraic laws to know for exam!

For any \(\vec{u},\vec{v},\vec{w}\in\mathbb R^n\) and \(\alpha \in\mathbb R\), we have

\(\vec{u} + \vec{v} = \vec{v}+\vec{u}\) (commutativity of vector addition),
\((\vec{u}+\vec{v}) + \vec{w} = \vec{u} + (\vec{v}+\vec{w})\) (associative law of vector addition), and
\(\alpha(\vec{u} + \vec{v}) = \alpha\vec{u} + \alpha\vec{v}\) (distributive law of scalar multiplication over vector addtion).

Componentwise proofs

These algebraic properties can all be proved by breaking them down to the component level, applying the corresponding rule at that level, and then reconstituting the vector formulation. This type of proof is called a componentwise proof.

For example, here’s a componentwise proof of the distributive law for scalar multiplication:

\[\begin{aligned} \alpha(\vec{u} + \vec{v}) &= \alpha\left(\langle u_1,u_2,\ldots,u_n \rangle + \langle v_1, v_2,\ldots,v_n \rangle\right) \\ &= \alpha\langle u_1+v_1,u_2+v_2,\ldots,u_n+v_n \rangle \\ &= \langle \alpha(u_1+v_1),\alpha(u_2+v_2),\ldots,\alpha(u_n+v_n) \rangle \\ &= \langle \alpha u_1+\alpha v_1,\alpha u_2+\alpha v_2,\ldots,\alpha u_n+\alpha v_n \rangle \\ &= \langle \alpha u_1, \alpha u_2,\ldots, \alpha u_n \rangle + \langle \alpha v_1, \alpha v_2,\ldots, \alpha v_n \rangle \\ &= \alpha \vec{u} + \alpha \vec{v}. \end{aligned}\]

Justification

Note that each of the six equals signs on the previous slide can be justified by some specific definition or property of the real number system. Those justifications are:

Definition of vectors in \(\mathbb R^n\)
Definition of vector addition in \(\mathbb R^n\)
Definition of scalar multiplication in \(\mathbb R^n\)
Distributive law of multiplication over addition of real numbers
Definition of vector addition in \(\mathbb R^n\)
Definition of vectors in \(\mathbb R^n\)

Note that the law we want to prove for vectors is applied for real numbers in step 4. For this reason, we might say that \(\mathbb R^n\) “inherits” many of the properties of \(\mathbb R\).

Linear combinations

Scientific problems are often simplified by breaking them into constituent parts. The study of motion in space, for example, can be broken into motion the \(x\), \(y\), and \(z\) directions.

In data science, the results of an algorithmic prediction might be the combination of a number of factors. Predicted points scored by a team in an upcoming game, for example, might be the combination of average points scored by that team in the past together with defensive statistics for the opponent.

If our data is encoded in a vector space, this leads us to ask - how to best combine the components of that data to most efficiently represent the aspects we need?

Definition of a linear combination

Given \(m\) vectors \[\vec{v}_1, \vec{v}_2, \vec{v}_3, \ldots, \vec{v}_m\] from \(\mathbb R^n\), and \(m\) scalars \(\alpha_1,\alpha_2,\alpha_3,\ldots\alpha_m\), the corresponding linear combination is \[\sum_{i=1}^m \alpha_i\vec{v}_i = \alpha_1\vec{v}_1 + \alpha_2\vec{v}_2 + \alpha_3\vec{v}_3 + \cdots + \alpha_m\vec{v}_m.\]

Example

If \(m=n=4\), \(\alpha_1=1,\alpha_2=-4,\alpha_3=2,\alpha_4=-1\), and

\[\vec{v_1} = \colvector{2\\4\\-3\\1}, \: \vec{v_2} = \colvector{6\\3\\0\\-2}, \: \vec{v_3} = \colvector{-5\\2\\1\\1}, \: \vec{v_4} = \colvector{3\\2\\-5\\7},\]

then our linear combination would be

\[ \small \colvector{2\\4\\-3\\1} -4\colvector{6\\3\\0\\-2}+ 2\colvector{-5\\2\\1\\1} -\colvector{3\\2\\-5\\7} =\colvector{2\\4\\-3\\1}+ \colvector{-24\\-12\\0\\8}+ \colvector{-10\\4\\2\\2}+ \colvector{-3\\-2\\5\\-7} =\colvector{-35\\-6\\4\\4} \]

Systems and linear combinations

Any linear system, like \[ \colvector{-7x_1 -6 x_2 - 12x_3\\ 5x_1 + 5x_2 + 7x_3\\ x_1 +4x_3} = \colvector{-33\\24\\5} \] can be written as a linear combination. For example: \[ x_1\colvector{-7\\5\\1}+ x_2\colvector{-6\\5\\0}+ x_3\colvector{-12\\7\\4} = \colvector{-33\\24\\5}\text{.} \]

Interpretation

The previous observation relating linear systems and linear combinations has a useful interpretation:

The vector \(\vec{x}\) is a solution of the system \[A\vec{x} = \vec{b}\] if and only if the vector \(\vec{b}\) can be expressed as a linear combination of the columns of \(A\) using the components of \(\vec{x}\) as the scalar multiples.

Linear span

Definition Given a set of vectors \(S\) in \(\mathbb R^n\), their span \(\spn{S}\) is the set of all linear combinations of those vectors.

Written symbolically, if \(S=\{\vectorlist{u}{p}\}\), \[\begin{align*} \spn{S}&=\setparts{\lincombo{\alpha}{u}{p}}{\alpha_i\in\mathbb R,\,1\leq i\leq p}\\ &=\setparts{\sum_{i=1}^{p}\alpha_i\vec{u}_i}{\alpha_i\in\mathbb R,\,1\leq i\leq p}\text{.} \end{align*}\]

Examples

In \(\mathbb R^3\), the span of \(\:\vec{\imath} = \langle 1,0,0 \rangle\) and \(\:\vec{\jmath} = \langle 0,1,0 \rangle\) is exactly the \(xy\)-plane.
The set of all \(\vec{b}\) such that of \(A\vec{x} = \vec{b}\) has a solution \(\vec{x}\) is the span of the columns of \(A\), according our previous observation.

Note that the span of a set of vectors in \(\mathbb R^n\) is an example of a subspace of \(\mathbb R^n\).

We’ll get to that at the end of these notes as we discuss more general vector spaces.

Linear independence

We say that a set of vectors \(S=\{\vectorlist{u}{n}\}\), is linearly dependent if the equation \[ \lincombo{\alpha}{u}{n}=\vec{0} \] has a non-trivial solution. (That is, not all the \(\alpha_i\)s are zero.)

Otherwise, the set \(S\) is linearly independent.

Examples

The sets

\(\{\langle 1,0 \rangle, \langle 0,1 \rangle\}\) and
\(\{\langle 1,0 \rangle, \langle 1,1 \rangle\}\)

are linearly independent in \(\mathbb R^2\).

The set \(\{\langle 1,0 \rangle, \langle 0,1 \rangle, \langle 1,1 \rangle\}\) is linearly dependent.

Determining linear independence

Given \(n\) vectors in \(\mathbb R^n\), there’s a simple algorithm to determine their linear dependence or independence.

Simply form the matrix whose columns are the given vectors and place it in reduced row echelon form. If the result has only zeros off the diagonal and only ones or zeros on the diagonal, then the vectors are linearly independent. Otherwise, the vectors are linearly dependent.

Example 1

Consider three column vectors that form the matrix \(A\) shown on the left below. The reduced row echelon form \(R\) of \(A\) is shown on the right, as we can check using SymPy. Given the form of \(R\), the columns of \(A\) must be linearly independent.

\[A = \left[\begin{matrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\\1 & 0 & 1\end{matrix}\right]\]

\[R = \left[\begin{matrix}1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1\\0 & 0 & 0\end{matrix}\right]\]

Example 2

Now, consider the three vectors that form the columns of the matrix \(B\) shown on the left below. (Note that the middle entry in the final row is different.) The reduced row echelon form \(E\) of \(B\) is shown on the right, as we can again check. Given the form of \(E\), the columns of \(B\) must be linearly dependent.

\[B = \left[\begin{matrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\\1 & 1 & 1\end{matrix}\right]\]

\[E = \left[\begin{matrix}1 & 0 & -1\\0 & 1 & 2\\0 & 0 & 0\\0 & 0 & 0\end{matrix}\right]\]

I guess this means that we could express the zero vector as a linear combination of those columns, as you’re asked to do in the last problem on the exam 1 review sheet…

Example 2 (cont)

Continuing with

\[B = \left[\begin{matrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\\1 & 1 & 1\end{matrix}\right]\]

\[E = \left[\begin{matrix}1 & 0 & -1\\0 & 1 & 2\\0 & 0 & 0\\0 & 0 & 0\end{matrix}\right]\]

We can read the solutions of \(B\vec{x}=\vec{0}\) from \(E\). We find that \(x_3\) is free and that \[ x_1 = x_3 \text{ and } x_2 = -2x_3. \] For example, \(\vec{x} = \langle 1,-2,1 \rangle\) is a solution. It’s easy to check that this linear combination of the columns yields \(\vec{0}\).

Spanning sets

Recall the definition of the span \(\spn{S}\) of a set of vectors \(S\) as the set of all linear combinations of those vectors.

We say that \(S\) spans \(\mathbb R^n\) if \[ \spn{S} = \mathbb R^n.\] Such a set is called a spanning set.

Spanning sets and linear independence

I guess the concept of a spanning set is somewhat complementary to linear independence. The more vectors you have, the more likely they are to span the space but the less likely they are to be linearly independent.

The sweet spot is a linearly independent spanning set. This type of set is called a basis for \(\mathbb R^n\).

Examples

The vectors \(\vec{\imath} = \langle 1,0 \rangle\) and \(\vec{\jmath} = \langle 0,1 \rangle\) form a basis for \(\mathbb R^2\).

This example extends in a natural way to a basis in \(\mathbb R^n\):

\[ \begin{bmatrix} 1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}, \: \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}, \: \begin{bmatrix} 0 \\ 0 \\ 1 \\ \vdots \\ 0 \end{bmatrix}, \begin{array}{c} \cdots \\ \\ \cdots \\ \\ \cdots \end{array}, \: \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} \]

More examples

There are many more examples of bases for \(\mathbb R^n\). The set \[\langle 1,1 \rangle, \: \langle 1,-1 \rangle\] forms a basis for \(\mathbb R^2\).

Any set linearly independent set of \(n\) vectors in \(\mathbb R^n\) form a basis.

Abstract vector spaces

One key feature of mathematics that truly distinguishes it from the sciences is its use of abstraction and the axiomatic method.

In abstraction, we identify the key features in a concept that we need to make it useful. We do this systematically by writing down axioms that truly define the object. We then prove theorems based on those axioms without any reference to the specific objects themselves. Any theorems we prove must then be satisfied by any objects that satisfy those same axioms. As a result, our theorems become more broadly applicable.

Minimal requirements for a vector space

The key facts about \(\mathbb R^n\) that allowed us to get started in the first place, were that it’s

A collections of objects that allow us
to make sense of addition and
to make sense of scalar multiplication, so
that the basic algebraic properties are satisfied.

Thus, the idea behind abstraction is to assume that we’ve got any collection of objects that allow us to do the same things. We’ve just got to write down what those key properties are.

Definition

A vector space over the field of real numbers \(\mathbb R\) is a non-empty set \(V\) equipped with

a binary operator mapping \(V\times V\to V\) that’s called addition and denoted \(\vec{u}+\vec{v}\) for \(\vec{u},\vec{v}\in V\) and
a binary function mapping \(\mathbb R\times V \to V\) that’s denoted \(\alpha\vec{v}\) for \(\alpha\in\mathbb R\) and \(\vec{v}\in V\).

In addition, a list of 8 axioms must be satisfied.

General axioms

The first five axioms must hold for all \(\vec{u},\vec{v},\vec{w}\in V\) and for all \(\alpha,\beta\in\mathbb R\).

\(\vec{u} + \vec{v} = \vec{v}+\vec{u}\) (commutativity of vector addition)
\(\alpha(\vec{u} + \vec{v}) = \alpha\vec{u} + \alpha\vec{v}\) (distributivity of scalar multiplication over vector addition)
\((\alpha + \beta)\vec{v} = \alpha\vec{v} + \beta\vec{v}\) (distributivity of scalar multiplication over real addition)
\((\vec{u}+\vec{v}) + \vec{w} = \vec{u} + (\vec{v}+\vec{w})\) (associativity of vector addition)
\((\alpha\beta)\vec{u} = \alpha(\beta\vec{u})\) (compatibility of real multiplication and scalar multiplication)

Specific axioms

The final three axioms assert the existence and/or behavior of some specific elements of \(V\) or \(\mathbb R\).

\(1\vec{u}= \vec{u}\) (scalar multipicative identity)
There is some \(\vec{0}\in V\) such that \(\vec{v}+\vec{0} = \vec{v}\) for all \(\vec{v}\in V\) (existence of zero vector)
For all \(\vec{v}\in V\) there is some \(-\vec{v}\in V\) such that \(\vec{v} + (-\vec{v}) = 0\). (Existence of additive inverses)

Subspaces

We can now be more precise about the idea of a subspace.

Given a vector space \(V\), a subspace \(W\) of \(V\) is a subset of \(V\) which itself satisfies the vector space axioms.

Simple Example

The linear span of the set containing \[\vec{\imath} = \langle 1,0,0 \rangle \text{ and } \vec{\jmath} = \langle 0,1,0 \rangle\] is a two-dimensional subspace of \(\mathbb R^3\).

Simple example extended

If \(V\) is a vector space and \(S\subset V\), then the linear span \(W = \spn{S}\) is a subspace of \(V\).

The key issue is that \(W\) is closed under the vector space operations. For example if \(\vec{u},\vec{v}\in W\), then \[\vec{u} = \lincombo{\alpha}{u}{n} \text{ and } \vec{v} = \lincombo{\beta}{v}{m}\] for some choice of \(\alpha_i\text{s},\beta_j\text{s} \in \mathbb R\) and \(\vec{u}_i\text{s},\vec{v}_j\text{s}\in S\). As a result, \[\begin{aligned} \vec{u}+\vec{v} &= \lincombo{\alpha}{u}{n} \\ &+ \lincombo{\beta}{v}{m} \in W. \end{aligned}\]

Once we know this, the axioms are satisfied since everything already lives in a larger vector space.

Function spaces

Here are five vector spaces of real functions, each of which is a subspace of the next.

The set of all polynomials of degree at most 2
The set of all polynomials of degree at most \(n\)
The set of all polynomials of any degree
The set of all differentiable functions
The set of all continuous functions.

Thinking of functions in this fact is tremendously powerful. This is the basis of Fourier analysis, for example.

Abstraction

When we work at this level of abstraction, theorems that we prove become much more broadly applicable.

For example, we know that the quickest way to get to a wall in the room is to move perpendicularly to it. We can generalize this idea to the concept of orthogonality to help us minimize the distance to a subspace. Doing so will lead to more efficient ways to solve optimization problems.