Linear Algebra I

\( \newcommand\C{\mathbb{C}} \newcommand\R{\mathbb{R}} \newcommand\Q{\mathbb{Q}} \newcommand\Z{\mathbb{Z}} \newcommand\N{\mathbb{N}} \newcommand\ve[1]{\boldsymbol{#1}} \newcommand\norm[1]{\left\Vert #1 \right\Vert} \def\u{\ve{u}} \def\v{\ve{v}} \def\w{\ve{w}} \def\span{\operatorname{span}} \def\Id{\operatorname{Id}} \def\kernel{\operatorname{ker}} \def\image{\operatorname{im}} \def\L{\mathscr{L}} \def\ev{\operatorname{ev}} \)

Fields

In brief, a field \(F\) is a set with two binary operations \(+\) and \(\times\) such that

\(+\) and \(\times\) are commutative.
\(+\) has an additive identity and inverses.
\(\times\) has a multiplicative identity and inverses for non-zero elements.
\(\times\) distributes over \(+\).

We also assume that the additive and multiplicative identities are distinct by convention.

Some examples of fields are \(\Q\), \(\R\), \(\C\), \(\Q[\sqrt{2}] = \{p + q\sqrt{2} : p, q \in \Q\}\).

Question: Does \(\{0, 1\}\) form a field?
Response: The set \(\{0, 1\}\) forms a field when \(+\) is defined as the \(\operatorname{XOR}\) operation and \(\times\) is defined as the \(\operatorname{AND}\) operation. This is a finite field, called the Galois field of two elements \(GF(2)\) or \(\Z_2\).

Note that any subfield \(F \subset \R\) must contain \(\Q\). To see this, note that \(0, 1 \in F\). Thus, \(F\) contains all positive integers by the closure of addition. The existence of additive inverses means that \(F\) also contains the negative integers. Now, every non-zero integer must have a multiplicative inverse, so all rationals of the form \(1/q\) are contained in \(F\). Scaling these by integers \(q \in F\) shows that \(F\) contains all numbers of the form \(p/q\) for integers \(p, q\) where \(q \neq 0\). Thus, \(\Q \subseteq F\).

Vector spaces

A vector space \(V\) over a field \(F\) is a set with a binary operation \(+\colon V\times V \to V\) and an operation \(\cdot\,\colon F\times V \to V\), satisfying the following axioms.

\(\u + \v \in V\) for all \(\u, \v \in V\).
\(a\u \in V\) for all \(a \in F\), \(\u \in V\).
\(\u + \v = \v + \u\) for all \(\u, \v \in V\).
\((\u + \v) + \w = \u + (\v + \w)\) for all \(\u, \v, \w \in V\)
There exists \(\ve{0} \in V\) such that \(\ve{0} + \u = \u = \u + \ve{0}\) for all \(\u \in V\).
For all \(\v \in V\), there exists \(\u \in V\) such that \(\u + \v = \ve{0}\). We denote \(\v = - \u\).
\(c(\u + \v) = c\u + c\v\) for all \(c \in F\), \(\u, \v \in V\).
\((ab)\u = a (b\u)\) for all \(a,b \in F\), \(\u \in V\).
\((a + b)\u = a\u + b\u\) for all \(a,b \in F\), \(\u \in V\).
There exists \(1 \in F\) such that \(1\u = \u\) for all \(\u \in V\).

A vector subspace \(W\) is a subset of a vector space \(V\) obeys the following axioms.

\(\u + \v \in W\) for all \(\u, \v \in W\).
\(a\u \in W\) for all \(a \in F\), \(\u \in W\).
\(\ve{0} \in W\), where \(\ve{0}\) is the zero element of \(V\).

Then, \(W\) is itself a vector space, called a subspace of \(V\).

Some examples of vector spaces are \(\{\ve{0}\}\) over \(\R\) (the zero vector space), \(\R^n\) over \(\R\).

Matrices as vectors

Consider the set of matrices with real valued entries, \(M_{m\times n}(\R)\). With addition defined entry-wise with \((0)_{ij}\) as the identity, scalar multiplication defined as \((cA)_{ij} = c(A)_{ij}\), this forms a vector space over \(\R\).

The vector space of square matrices \(M_n(\R)\) has several interesting subspaces.

The set of symmetric matrices, \(\operatorname{sym}_n = \{A \in M_n(\R): A = A^T\}\). Note that there are \(1 + 2 + \dots + n\) 'free entries', so this space looks very similar to \(\R^{1 + 2 + \dots + n}\).
The set of traceless matrices, \(W_n = \{A \in M_n(\R): \operatorname{trace}{A} = 0\}\). Again, this subspace looks like \(\R^{n^2-1}\).
The set of diagonal matrices, \(D_n =\{A \in M_n{\R}: A_{ij} = 0 \text{ if }i\neq j\}\). Again, this subspace looks like \(\R^n\).
The set of scalar matrices, \(S_n = \{\lambda I_n \in M_n(\R): \lambda \in \R \}\). Again, this subspace looks like \(\R\). This also happens to be a field.

Note that \(S_n\ \subset D_n \subset \operatorname{sym}_n\).

Functions as vectors

Let \(S\) be a set and define \[ \mathscr{F}(S, \R) = \{f\colon S \to \R\}. \] We define our operations as follows.

\((f + g)(s) = f(s) + g(s)\) for all \(f, g \in \mathscr{F}(S, \R)\).
\(\ve{0}(s) = 0\) is the additive identity.
\((cf)(s) = c f(s)\) for all \(f \in \mathscr{F}(S, \R)\).

Then \(\mathscr{F}(S, \R)\) is a vector space over \(\R\).

Note that when \(S = \{1, 2, \dots, n\}\), this vector space seems very similar to \(\R^n\).

Question: What structure is \(\mathscr{F}(\N, \R)\) similar to?
Response: When \(S = \N\), this vector space is very similar to the set of all sequences of real numbers.

Polynomials as vectors

Let \(P_n(\R)\) be the set of all polynomials with real coefficients and degree at most \(n\), where \(n \in \Z_{\ge0}\). We note that there is a bijection between such polynomials and the tuples of their coefficients, which are elements of \(\R^{n+1}\). Thus, we can show that \(P_n(\R)\) is a vector space over \(\R\).

Note that \(P_n(\R) \subset P_{n+1}(\R)\). The union of all such sets, i.e. the set of all real polynomials \(P(\R) = \bigcup_{n\geq 0} P_n(\R)\) is also a vector space.

Question: Does \(P(\R)\) have any other structure?
Response: By defining multiplication in the usual way, we see that \(P(\R)\) forms a ring.

It is easily verified that the set of even polynomials forms a subspace of \(P(\R)\). Another subspace is the set of truncated polynomials, which for fixed \(k \in \N\) have the coefficients of \(1, x, \dots, x^k\) all set to \(0\).

Question: Does the set of polynomials of degree exactly equal to some \(k \in \N\), together with the zero polynomial, form a vector space?
Response: No, since addition is not closed in this structure. The degree of the sum of two polynomials certainly cannot increase, but may decrease. A simple example is \((x + 1) - (x) = 1\).

Vector subspaces

We may ask what are the vector subspaces of \(\R^2\), apart from the trivial zero vector space and \(\R^2\) itself. We see that all lines through the origin, i.e. \(L_m = \{(x, y) \in \R^2: y = mx\}\), are vector subspaces of \(\R^2\). Furthermore, we can show that these are the only subspaces of \(\R\).

To prove this, note that subspace of \(\R^2\) must contain the zero element of \(\R^2\), which is \((0, 0)\). Let our subspace \(L\) include this point, and let \(L\) contain at least one more distinct point \(\v \in \R^2\). By closure of addition and scaling, all points of the form \(\lambda\v\) are contained in \(L\), for all \(\lambda \in \R\). So far, this is simply \(L_m\), and it can be verified that this is indeed a subspace. Now, let a non-zero point not on this line be in \(L\), say \(\w \in L\) such that \(\w \notin \{\lambda\v: \lambda \in \R\}\). Thus, there is no \(\lambda \in \R\) such that \(\w = \lambda \v\). Again, by closure, \(L\) contains all points of the form \(a\w + b\v\), for all \(a,b \in \R\). Let \(\u \in \R^2\) be arbitrary. Setting \(\Delta = w_xv_y - w_yv_x\), note that \(\Delta \neq 0\), since if it were zero, then \(\w = \lambda \v \) for \(\lambda = w_x/v_x = w_y/v_y\). (A small detail here is that if \(v_x = 0\), this forces \(w_x = 0\) and \(v_y, w_y \neq 0\) as well, since \(\v, \w \neq \ve0\). Thus, \(\lambda\) is well-defined. There is an analogous case with \(v_y = 0 \implies v_x, w_x \neq 0\).) It is easily verified that \[\u = (u_xv_y - u_yv_x)\w/\Delta - (u_xw_y - u_yw_x)\v/\Delta .\] Thus, \(\u \in L\) for arbitrary \(\u \in \R^2\), so \(L\) is the entirety of \(\R^2\). This means that the only subspaces of \(\R^2\) are \(\{\ve0\}\), \(L_m\) and \(\R^2\).

The subspaces \(L_m \subset \R^2\) are called lines. On a 2D plane, these clearly represent straight lines of slope \(m\). Note that the line \(L = \{(x, y) \in \R^2: x = 0\}\) is also a subspace of \(\R^2\), which is awkwardly expressed in our \(L_m\) notation. The notation \(L_\v = \{\lambda\v: \lambda\in\R\}\) for some \(\v \in \R^2\) is preferable. Later, we will see that for \(\v,\w \in \R^3\), the planes \(M_{\v,\w} = \{\alpha\v + \beta\w: \alpha, \beta \in \R\}\) are all subspaces of \(\R^3\). This naturally leads to the idea that linear combinations of vectors form subspaces. The set of linear combinations of some vectors is called their span.

Linear combinations of vectors

Let \(V\) be a vector space over the field \(F\). We define the span of a set of vectors \(W \subseteq V\) as follows. \[ \span{W} = \{\lambda_1\w_1 + \lambda_2\w_2 + \dots + \lambda_n\w_n: \w_i \in W, \lambda_i \in F\}. \]

It is easily verified that for any set of vectors \(W\subseteq V\), \(\span{W}\) forms a subspace of \(V\). Furthermore, \(\span W\) is the smallest subspace of \(V\) containing \(W\).

Some other properties of the span are

\(\span\emptyset = \{\ve{0}\}\).
\(\span(\span S) = \span{S}\). Note that \(\span W = W\) iff \(W\) is a subspace.
\(\span W_1 \subseteq \span W_2\) if \(W_1 \subseteq W_2\).
\(\span(W_1 \cup W_2) = \span(W_1) + \span(W_2)\).
\(\span(W_1 \cap W_2) \subseteq \span(W_1) \cap \span(W_2)\).

We say that a vector space \(V\) is generated by \(W \subseteq V\) if \(V = \span{W}\).

Linear dependence and independence

A set of vectors \(W \subseteq V\) is said to be linearly dependent if there exist distinct vectors \(\w_1, \w_2, \dots, \w_n \in W\) and scalars \(\lambda_i \in F\), not all zero, such that \[\lambda_1\w_1 + \lambda_2\w_2 + \dots + \lambda_n\w_n = \ve0.\] A set of vectors \(W \subseteq V\) is said to be linearly independent if it is not linearly dependent.

We can easily show that if \(S^* = S \setminus\{\v\}\), where \(\v\) is a linearly dependent element of \(S\), \(\span S = \span S^*\). Contrapositively, if \(S\) is a linearly independent set, then for any subset \(T \subset S\), \(\span T \subset \span S\).

Note: The empty set \(\emptyset\) is linearly independent.

For any linearly independent subset \(S \subset V\) and \(\v \in V\setminus S\), the set \(S \cup \{\v\}\) is linearly independent iff \(\v \notin \span S\).
\((\Leftarrow)\) Let \(\v \notin \span S\). For \(\v_i \in S\), if \(\lambda_i \in F\) are not all zero and \[ \lambda_1\v_1 + \dots + \lambda_n\v_n + \lambda\v = \ve0, \] then \(\lambda\) must be non-zero because of the linear independence of \(S\). This would mean that \[ \v = \frac{1}{\lambda}(\lambda_1\v_1 + \dots + \lambda_n\v_n) \in \span S, \] which is a contradiction. Thus, there are no \(\v_i \in S\cup \{\v\}\) such that a non-trivial linear combination gives \(\ve 0\), so \(S\cup\{\v\}\) is linearly independent.
\((\Rightarrow)\) Let \(S\cup\{\v\}\) be linearly independent. Suppose that \(\v \in \span S\), i.e. for some \(\v_i \in S\), \(\lambda_i \in F\), \[ \lambda_1\v_1 + \dots + \lambda_n\v_n = \v. \] However, by taking \(\v\) to the LHS, we have found a non-trivial linear combination of vectors giving \(\ve0\), which contradicts the linear independence of \(S\cup\{\v\}\). Thus, \(\v\notin\span S\).

Basis

A basis \(\beta\) of a vector space \(V\) is a subset such that \(\beta\) is linearly independent and \(\span\beta = V\).

The zero vector space \(\{\ve0\}\) is its own basis.
The set of vectors \(\ve{e}_i = (\delta_{ij})_{j}^T\), i.e. the column vector with \(1\) in the \(i^\text{th}\) entry and \(0\) in the rest, is called the standard basis of \(\R^n\).
The set of polynomials \(\beta = \{1, x, x^2, \dots\}\) is a basis of the space of real polynomials \(P(\R)\).

Note that if \(\beta\) is a finite basis of \(V\), then every \(\v \in V\) can be expressed uniquely as \[ \v = \lambda_1\v_1 + \dots + \lambda_n\v_n, \] for \(\v_i \in \beta\), \(\lambda_i \in F\). To show this, suppose that for \(\v_i \in \beta\) \[ \v = \sum\lambda_i\v_i = \sum\mu_i\v_i \] for scalars \(\lambda_i, \mu_i \in F\). Transferring all terms to one side, \[ (\lambda_1 - \mu_1)\v_1 + \dots + (\lambda_n - \mu_n)\v_n = \ve0. \] The linear independence of \(\beta\) forces \(\lambda_i = \mu_i\).

Constructing a basis

Given any finite \(S \subseteq V\) such that \(\span S = V\), there exists a basis \(\beta\) of \(V\), where \(\beta \subseteq S\). Note that the case \(S = \emptyset\) is trivial. Now, for non-empty \(S\), choose \(\u_1 \in S\), and note that \(\{\u_1\}\) is linearly independent. If possible, choose \(\u_2 \in S\) such that \(\{\u_1, \u_2\}\) is also linearly independent. Note that if this is not possible, that would mean that for any \(\v \in S\setminus\{\u_1\}\), the set \(\{\u_1, \v\}\) is linearly dependent, i.e. \(\lambda_1\u_1 + \lambda\v = 0\). Since \(\lambda\neq 0\), we must have \(\v \in \span\{\u_1\}\), so \(\span \{\u_1\} = \span S = V\). We may repeat this process until we have no more choices, and we end up with the basis \(\beta = \{\u_1, \dots, \u_k\}\). This is indeed a basis, since at the end, either \(\beta = S\) in which case we are done, or for any \(\v\in S\setminus\beta\), the set \(\beta\cup\{\v\}\) is linearly dependent, so \(\v\in\span S\) and thus \(\span\beta = \span S = V\).

We say that a vector space \(V\) is infinite dimensional if it has no finite basis.

Replacement Theorem

Let \(V\) be a vector space that is spanned by a finite set \(S\) of size \(n\). Let \(L\) be a linearly independent set of size \(m\). Then, we must have \(m \leq n\). Furthermore, there exists \(T \subseteq S\) of size \(n - m\) such that \(T \cup L\) spans \(V\).

To prove this, suppose \(S = \{\v_1, \dots, \v_n\}\) and \(L = \{\u_1, \dots, \u_m\}\). Note that \(\v_i, \u_i \neq \ve0\). We write \[ \u_m = a_1\v_1 + \dots + a_n\v_n \] for \(a_i \in F\). Note that \(a_i\) are not all zero. Without loss of generality, let \(a_n \neq 0\). We thus write \[ \v_n = \frac{1}{a_n}(\u_m - a_1\v_1 - \dots - a_{n-1}\v_{n-1}). \] Setting \(S_1 = \{\v_1, \dots, \v_{n-1}, \u_m\}\), we see that \(\v_n \in \span S_1\). Thus, \(S_1\) spans \(V\).

We iterate the same process to obtain the sets \(S_i = \{\v_1, \dots\v_{n - i}, \u_{m - i + 1}, \dots, \u_{m}\}\). Note that in each case, \(\span S_i = V\). At some point, this process stops. If \(m > n\), then \(S_n = \{\u_{m-n+1}, \dots, \u_m\}\subset L\). On the other hand, \(S_n\) spans \(V\), so the left over elements in \(L\) are non-trivial linear combinations of elements in \(S_n\). This contradicts the linear independence of \(L\). Thus, we must have \(m \leq n\), and we end up with \(S_m = \{\v_1, \dots, \v_{n-m}, \u_1, \dots, \u_m\}\) \(= \{\v_1, \dots, \v_{n-m}\} \cup L\) which spans \(V\).

We may also rephrase our proposition as follows: \(L\) can always be extended to a basis and \(S\) can always be reduced to a basis.

The Replacement Theorem immediately implies that any two finite bases \(\beta\), \(\gamma\) of \(V\) have the same size. This number is called the dimension of the vector space over the field \(F\), denoted \(\dim_F V\).

Let \(V\) be a vector space with dimension \(n\).

Any spanning set \(S\) of \(V\) contains at least \(n\) elements. If \(S\) is of size \(n\), then \(S\) is a basis.
Any linearly independent subset containing \(n\) elements is a basis.
Any subspace \(W\) has dimension \(\dim W \leq n\). If \(\dim W = n\), then \(W = V\).

Question: Consider the vector space of functions \(\mathscr{F} = \mathscr{F}([a, b], \R)\). Show that this is infinite dimensional.
Response: Suppose that to the contrary, \(\mathscr{F}\) is \(n\) dimensional. We now construct the functions \(g_\alpha \in \mathscr{F}\), where \(\alpha \in [a, b]\). \[ g_\alpha\colon[a, b] \to \R,\quad g_\alpha(x) = \begin{cases} 1, & \text{ if } x = \alpha \\0, & \text{ if } x \neq \alpha \end{cases}.\] Choose \(\alpha_1, \dots, \alpha_n, \alpha_{n+1} \in [a, b]\). We claim that the set of functions \(\{g_{\alpha_1}, \dots, g_{\alpha_n}, g_{\alpha_{n + 1}}\}\) is linearly independent. To show this, suppose that for scalars \(\lambda_i\), \[ \lambda_1 g_{\alpha_1} + \dots + \lambda_{n+1} g_{\alpha_{n + 1}} = \ve{0}. \] This must hold when evaluated at each of the \(\alpha_i\), which immediately gives \(\lambda_i = 0\). Thus, we have found a linearly independent subset of \(\mathscr{F}\) of size \(n + 1\), which is a contradiction.

Direct sums

Let \(V\) be a vector space, and let \(W_1, W_2\) be subspaces. If \(W_1 \cap W_2 = \{\ve0\}\), we denote \(W_1 \oplus W_2\) as the direct sum of the two vector spaces, which is simply \(W_1 + W_2\).

In general, \(W_1 + W_2\) is a vector subspace where \(W_1\) and \(W_2\) are subspaces.

We note that \[ \dim{W_1 \oplus W_2} = \dim W_1 + \dim W_2. \]

Any \(\v \in W_1 \oplus W_2\) is uniquely expressible as the sum \(\w_1 + \w_2\), where \(\w_1 \in W_1\) and \(\w_2 \in W_2\). It is also true that if \(\v \in W_1 + W_2\) is uniquely expressible, then \(W_1 \cap W_2 = \{\ve0\}\).

Quotient spaces

Let \(V\) be a vector space and let \(W\) be a subspace. We define the equivalence relation \(\sim_W\) such that \[ \v_1 \sim \v_2 \quad\text{ iff }\quad \v_1 - \v_2 \in W. \]

The set of equivalence classes \(V/W\) under the relation \(\sim_W\) forms a vector space. Note that the equivalence class \([\v] = \v + W\), i.e. the translation of all elements in \(W\) by \(\v\). Thus, we define \[ [\u] + [\v] = [\u + \v], \quad\quad \lambda[\v] = [\lambda\v]. \]

Note that we must verify that these operations are well defined, since there are many choices for \(\v\) for a fixed equivalence class \([\v]\). Suppose \([\v] = [\v']\) and \([\u] = [\u']\). This means that \(\v - \v' \in W\) and \(\u - \u' \in W\). Thus, \(\u + \v \sim_W \u' + \v'\) since \((\u - \u') + (\v - \v') \in W\). This means that \([\u + \v] = [\u'] + [\v']\). Similarly, \([\lambda\v] = [\lambda\v']\) since \(\lambda\v - \lambda\v' = \lambda(\v - \v') \in W\).