Vector Spaces
A vector space is a set V together a field $\mathbb{F}$ (for our purposes either the real numbers $\mathbb{R}$ or the complex numbers $\mathbb{C}$) and two operations:
Note that there are two different additions that we can do: addition of elements of the field $\mathbb{F}$ which we call scalar addition and addition of elements of the vector space $V$ which we call vector addition. These operations have the following properties:
- Scalar multiplication: for $\alpha\in \mathbb{F}$ and $v\in V$, the scalar multiple of $v$ by $\alpha$, written $\alpha v$, yields an element of $V$.
- Vector addition: for $v_1, v_2\in V$, the sum of $v_1$ and $v_2$, written $v_1 + v_2$, yields an element of $V$.
It can be shown that any set behaving according to these axioms for a field $\mathbb{F}$ is equivalent to a sequence of elements from that field $\mathbb{F}$ with the operations defined by: \begin{eqnarray*} \alpha \left(\beta_1, \beta_2, \ldots\right) & = & \left(\alpha \beta_1, \alpha \beta_2, \ldots\right)\\ \left(\alpha_1, \alpha_2,\ldots\right) + \left(\beta_1, \beta_2, \ldots\right) & = & \left(\alpha_1 + \beta_1, \alpha_2 + \beta_2, \ldots\right)\\ \end{eqnarray*} The only quantities which differentiate vector spaces from each other is the field $\mathbb{F}$ and the number of elements in the sequences. The number of elements is called the dimension of the vector space.
- Properties involving only addition:
Sets and operations obeying these axioms are referred to as Abelian groups.
- Associativity: for $v_1, v_2, v_3\in V$: \begin{eqnarray*} v_1 + \left(v_2 + v_3\right) = \left(v_1 + v_2\right) + v_3 \end{eqnarray*}
- Identity: there is an element $0\in V$ such that for any $v\in V$: \begin{eqnarray*} v + 0 = v \end{eqnarray*}
- Inverse: for any $v\in V$, there is an element $-v\in V$ such that: \begin{eqnarray*} v + -v = 0 \end{eqnarray*}
- Commutativity: for $v_1, v_2\in V$: \begin{eqnarray*} v_1 + v_2 = v_2 + v_1 \end{eqnarray*}
- Properties involving only scalar multiplication:
- Associativity:for any $\alpha, \beta\in \mathbb{F}$ and any $v\in V$: \begin{eqnarray*} \alpha \left(\beta v\right) = \left(\alpha \beta \right)v \end{eqnarray*}
- Identity:for any $v\in V$: \begin{eqnarray*} 1 v = v \end{eqnarray*}
- Properties involving addition and scalar multiplication:
- Distributivity of scalar multiplication over vector addition: for any $\alpha\in\mathbb{F}$ and $v_1,v_2\in V$: \begin{eqnarray*} \alpha\left(v_1 + v_2\right) = \alpha v_1 + \alpha v_2 \end{eqnarray*}
- Distributivity of scalar addition over scalar multiplication: for any $\alpha, \beta\in\mathbb{F}$ and $v\in V$: \begin{eqnarray*} \left(\alpha + \beta\right)v = \alpha v + \beta v \end{eqnarray*}
Bases
A set of vectors $v_1, v_2, \ldots$ from a vector space $V$ over the field $\mathbb{F}$ is linearly independent if for any $\alpha_1, \alpha_2, \ldots \in \mathbb{F}$ such that: \begin{eqnarray*} \sum_i \alpha_i v_i = 0 \end{eqnarray*} we have that $\alpha_1 = \alpha_2 = \ldots = 0$. The vectors, $v_1, v_2, \ldots$ spans $V$ if every vector $v\in V$ can be written as: \begin{eqnarray} v = \sum_i \alpha_i v_i\label{span} \end{eqnarray} for some $\alpha_1, \alpha_2, \ldots\in\mathbb{F}$. A set of vectors which is linearly independent and spans $V$ is called a basis. The number of elements in a basis has the same cardinality (is the same as) the dimension of the vector space. Note that, for a basis, the representation in (\ref{span}) is unique. If there were two reprensentations of the same vector, $\alpha_1, \alpha_2, \ldots$ and $\alpha_1', \alpha_2', \ldots$, then: \begin{eqnarray*} \sum_i \alpha_i v_i = \sum_i \alpha_i' v_i\\ \end{eqnarray*} so that: \begin{eqnarray*} \sum_i \left(\alpha_i - \alpha'_i\right) v_i = 0\\ \end{eqnarray*} which from linear independence imply that $\alpha_i - \alpha_i' = 0$ or $\alpha_i = \alpha_i'$ for all $i$. The factors $\alpha_1, \alpha_2, \ldots$ are called the components of the vector and are typically written
It can be shown that every vector space has a basis. For example, for the vector space consisting of a sequence of $n$ elements from $\mathbb{F}$ are: \begin{eqnarray*} e_1 = \left(1,0,0,\ldots,0\right)\\ e_2 = \left(0,1,0,\ldots,0\right)\\ e_3 = \left(0,0,1,\ldots,0\right)\\ \ldots\\ e_n = \left(0,0,0,\ldots,1\right)\\ \end{eqnarray*} If $v = \left(\alpha_1, \alpha_2, \alpha_3, \ldots, \alpha_n\right)$ is any vector in this vector space, then it's representation in terms of the above vector is given by: \begin{eqnarray*} v = \sum_{i=1}^n \alpha_i e_i \end{eqnarray*} In order to demonstrate that bases are not unique, we provide another basis for the space discussed above: \begin{eqnarray*} e_1 = \left(1,0,0,\ldots,0\right)\\ e_2 = \left(1,1,0,\ldots,0\right)\\ e_3 = \left(1,1,1,\ldots,0\right)\\ \ldots\\ e_n = \left(1,1,1,\ldots,1\right)\\ \end{eqnarray*} It can be verified that $v$ can be represented as: \begin{eqnarray*} v = \sum_{i=1}^n \left(\alpha_i - \sum_{j=i+1}^n\alpha_i\right)e_i \end{eqnarray*}
Linear Transformations
A linear transformation from a $T:V\rightarrow V'$, for vector spaces $V$ and $V'$ over the same field, is a transformation that preserves scalar multiplication and vector addition: \begin{eqnarray*} T\left(\alpha v\right) & = & \alpha T\left(v\right)\\ T\left(v_1 + v_2\right) & = & T\left(v_1\right) + T\left(v_2\right)\\ \end{eqnarray*} for all $\alpha\in\mathbb{F}$ and $v_1, v_2\in V$. Note that this can be summarized as: \begin{eqnarray*} T\left(\alpha_1 v_1 + \alpha_2 v_2\right) = \alpha_1 T\left(v_1\right) + \alpha_2 T\left(v_2\right) \end{eqnarray*} for all $\alpha_1, \alpha_2\in\mathbb{F}$ and $v_1, v_2\in V$. Let $v_1, v_2, \ldots$ be a basis for $V$ and $v_1', v_2', \ldots$ a basis for $V'$. Consider the action of a linear transformation $T$ on the basis vector $v_i$ and decompose the resulting vector in terms of the basis $v_1', v_2', \ldots$: \begin{eqnarray*} T\left(v_i\right) = \sum_j a_{j,i} v_j' \end{eqnarray*} In other words, $a_{i,j}$ is the component of $v_j'$ when $T$ is applied to $v_i$. If we represent any vector $v\in V$ by the sequence $\left(\alpha_1, \alpha_2, \ldots\right)$ where $v = \sum_i \alpha_i v_i$ and similarly any vector $v'\in V'$ by the sequence $\left(\alpha_1', \alpha_2', \ldots\right)$ where $v' = \sum_j \alpha_j' v_j$ then the action of the matrix $T$ on $v$ is \begin{eqnarray*} T\left(\sum_i \alpha_i v_i\right) = \sum_i \alpha_i T\left(v_i\right) = \sum_i \alpha_i \sum_j a_{j,i} v_j' = \sum_j \sum_i a_{j,i} \alpha_i v_j' \end{eqnarray*} The action of $T$ involves going basis weights $\alpha_1, \alpha_2, \ldots$ to basis weights: \begin{eqnarray*} \left(\sum_i a_{1,i}\alpha_i, \sum_i a_{2,i}\alpha_i, \sum_i a_{3,i}\alpha_i, \ldots \right) \end{eqnarray*} The elements $a_{i,j}$ are referred to as the matrix corresponding to $T$ and the operation above is referred to as matrix vector multiplication and written $a v$. Matrices are sometimes written as square arrays of numbers: \begin{eqnarray*} \left(\begin{array}{ccccc} a_{1,1} & a_{1,2} & a_{1,3} & \ldots & a_{1,n}\\ a_{2,1} & a_{2,2} & a_{2,3} & \ldots & a_{2,n}\\ a_{3,1} & a_{3,2} & a_{3,3} & \ldots & a_{3,n}\\ \ldots & \ldots & \ldots & \ldots & \ldots\\ a_{m,1} & a_{m,2} & a_{m,3} & \ldots & a_{m,n}\\ \end{array}\right) \end{eqnarray*} where $n$ is the number of dimensions of $V$ and $m$ is the number of dimensions of $V'$. Note that the matrix of numbers is dependent upon the bases chosen: a different basis for either the input (domain) or output (range) vector spaces will in general yield a different matrix.
Let $S:V\rightarrow V'$ and $T:V'\rightarrow V''$ be linear transformations. We consider the effect of applying two linear transformations, $S$ and $T$, in sequence: \begin{eqnarray*} T\left(S\left(\alpha v + \alpha' v'\right)\right) & = & T\left(\alpha S\left(v\right) + \alpha' S\left(v'\right)\right)\\ & = & \alpha T\left(S\left(v\right)\right) + \alpha' T\left(S\left(v'\right)\right) \end{eqnarray*} Hence, the combined transformation, called the composition and written $T \circ S$, is linear. We now find the matrix corresponding with the composition $T\circ S$. We overload the notation by letting $T$ be the matrix corresponding to the transformation $T$ and similarly for $S$: \begin{eqnarray*} T\left(S\left(\sum_i \alpha_i v_i\right)\right) & = & T\left(\sum_{i,j} S_{j,i} \alpha_i v_j'\right)\\ & = & \sum_{i,j,k} T_{k,j} S_{j,i} \alpha_i v_j''\\ \end{eqnarray*} so that: \begin{eqnarray*} \left(T \circ S\right)_{k,i} = \sum_j T_{k,j} S_{j,i} \end{eqnarray*} This is multiplication of the matrices $T$ and $S$, written $T S$.
Matrix Operations
For the purposes of this course, a matrix is the representation of a linear transformation under bases for the range and domain. To proceed further without introducing unnecessary and counterintuitive mathematics, we consider just the matrices and not ignore the bases. Let $\mathbb{R}^{n\times m}$ be the set of matrices corresponding to a linear transformation from a vector space of dimension $m$, written $\mathbb{R}^n$ to one of dimension $n$. Note that $\mathbb{R}^{n\times m}$ has $n m$ entries. A vector $v\in\mathbb{R}^n$ is represented by a column vector: \begin{eqnarray*} v = \left(\begin{array}{c} v_1\\ v_2\\ \ldots\\ v_n \end{array}\right) \end{eqnarray*} As discussed, multiplication of a matrix $A\in\mathbb{R}^{n\times m}$ and a vector $v\in\mathbb{R}^m$ yields the vector given by: \begin{eqnarray*} \left(A v\right)_i = \sum_j A_{i,j}v_j \end{eqnarray*} We also use row vectors $v\in\mathbb{R}^n$: \begin{eqnarray*} v = \left(\begin{array}{cccc}v_1 & v_2 & \ldots & v_n\end{array}\right) \end{eqnarray*} which can be multiplied by matrices on the left: \begin{eqnarray*} v A = \sum_i v_i A_{i,j} \end{eqnarray*}
The following operations are available on matrices:
- Transpose: for $A\in \mathbb{R}^{n\times m}$, the transpose of $A$, written $A^T$, is the matrix from $\mathbb{R}^{m\times n}$ given by: \begin{eqnarray*} \left(A^T\right)_{i,j} = A_{j,i} \end{eqnarray*} We identify column vectors with $\mathbb{R}^{n\times 1}$ and row vectors with $\mathbb{R}^{1\times n}$. The transpose of a column vector is a row vector.
- Inverse: the inverse, if it exists, of a square matrix $A\in\mathbb{R}^{n\times n}$ is the matrix $A^{-1}$ such that: \begin{eqnarray*} A^{-1} A = I \end{eqnarray*} where $I$ is the identity matrix: \begin{eqnarray*} I = \left(\begin{array}{cccc} 1 & 0 & \ldots & 0\\ 0 & 1 & \ldots & 0\\ \ldots & \ldots & \ldots & \ldots\\ 0 & 0 & \ldots & 1\\ \end{array}\right) \end{eqnarray*} As an example, the inverse of the following matrix: \begin{eqnarray} A = \left(\begin{array}{cc} 1 & 1\\ 0 & 1\\ \end{array}\right)\label{nondiagonalizable}\\ \end{eqnarray} is given by: \begin{eqnarray*} A^{-1} = \left(\begin{array}{cc} 1 & -1\\ 0 & 1\\ \end{array}\right)\\ \end{eqnarray*} Note that the following matrix does not have an inverse: \begin{eqnarray*} \left(\begin{array}{cc} 1 & 1\\ 0 & 0\\ \end{array}\right)\\ \end{eqnarray*} The inverse matrix is often used in solving equations of the form $A x = b$ with known $A\in\mathbb{R}^{n\times n}$ and known $b\in\mathbb{R}^n$ but unknown $x\in\mathbb{R}^n$. The solution is given by: \begin{eqnarray*} x = A^{-1}b \end{eqnarray*} In some cases, a solution to the equation might exist even though the matrix is not invertible. For example, a solution to the following equation: \begin{eqnarray*} \left(\begin{array}{cc} 1 & 1\\ 0 & 0\\ \end{array}\right) x = \left(\begin{array}{c} 1\\ 0\\ \end{array}\right) \end{eqnarray*} is given by: \begin{eqnarray} x = \left(\begin{array}{c} 1\\ 0\\ \end{array}\right)\label{eigenvector} \end{eqnarray}
- Determinant: the determinant of a square matrix $A\in\mathbb{R}^{n\times n}$, written $\det A$ is the volume of the parallelpiped formed by the vectors corresponding to the columns of the matrix. The determinant of a matrix is non-zero if and only if the matrix is invertible.
Eigenvalues and Eigenvectors
One of the most powerful ways of analyzing linear transformations is through their eigenvalues which we now discuss. Let $V$ be a vector space over the real or complex numbers. An eigenvalue of a linear transformation $T:V\rightarrow V$ is a complex number $\lambda\in\mathbb{C}$ such that there is a vector $v$ such that: \begin{eqnarray*} T v = \lambda v \end{eqnarray*} The vector $v$ above is referred to as the eigenvector corresponding to the eigenvalue $\lambda$. Note that the eigenvalues are invariant to the basis and so are the same for any matrix corresponding to the transformation. However, the components of the eigenvectors depend upon the basis and so can be different for different matrices corresponding to the same linear transformation. Every matrix has at least one eigenvalue. Eigenvectors defined above are sometimes called right eigenvectors since they are defined via right matrix vector multiplication. A left eigenvector is defined as an eigenvector through left multiplication: \begin{eqnarray*} l A = \lambda l \end{eqnarray*} Note that the sets of eigenvalues corresponding to the left and right eigenvectors of a matrix are the same.
A matrix $A$ whose eigenvectors, $v_1, v_2, \ldots, v_n$, form a basis is called diagonalizable. If $A$ is diagonalizable then: \begin{eqnarray*} A = V \Lambda V^{-1} \end{eqnarray*} where: \begin{eqnarray*} V = \left(\begin{array}{ccccc} v_1 & v_2 & v_3 & \ldots & v_n \end{array}\right) \end{eqnarray*} and: \begin{eqnarray*} \Lambda = \left(\begin{array}{ccccc} \lambda_1 & 0 & 0 & \ldots & 0\\ 0 & \lambda_2 & 0 & \ldots & 0\\ 0 & 0 & \lambda_3 & \ldots & 0\\ \ldots & \ldots & \ldots & \ldots & \ldots\\ 0 & 0 & 0 & \ldots & \lambda_n\\ \end{array}\right) \end{eqnarray*} is a diagonal matrix. If we multiply a diagoanlizable matrix by itself $N$ times we get: \begin{eqnarray*} A A A \ldots A & = & V \Lambda V^{-1} V \Lambda V^{-1} \ldots V \Lambda V^{-1}\\ & = & V \Lambda \Lambda \ldots \Lambda V^{-1} = V \Lambda^N V^{-1}\\ \end{eqnarray*} where: \begin{eqnarray*} \Lambda^N = \left(\begin{array}{ccccc} \lambda_1^N & 0 & 0 & \ldots & 0\\ 0 & \lambda_2^N & 0 & \ldots & 0\\ 0 & 0 & \lambda_3^N & \ldots & 0\\ \ldots & \ldots & \ldots & \ldots & \ldots\\ 0 & 0 & 0 & \ldots & \lambda_n^N\\ \end{array}\right) \end{eqnarray*} Note that, as $N$ becomes large, this will quickly be dominated by the eigenvalue of largest magnitude and the corresponding eigenvectors.
Note that not every matrix is diagonalizable. For example, the matrix given in (\ref{nondiagonalizable}) is not diagonalizable. It has a single eigenvalue of $1$ with eigenvector (\ref{eigenvector}). Since its domain space has dimension $2$ and a ssingle vector can't span a space of dimension $2$, it is not diagonalizable. However, every matrix can be put into a form similar to a diagonal form called the Jordan normal form: \begin{eqnarray*} A = V J V^{-1} \end{eqnarray*} where: \begin{eqnarray*} J = \left(\begin{array}{ccccccc} \lambda_1 & 1 & 0 & 0 & 0 & 0 & \ldots\\ 0 & \lambda_1 & 1 & 0 & 0 & 0 & \ldots\\ 0 & 0 & \lambda_1 & 0 & 0 & 0 & \ldots\\ 0 & 0 & 0 & \lambda_2 & 1 & 0 & \ldots\\ 0 & 0 & 0 & 0 & \lambda_2 & 0 & \ldots\\ \ldots & \ldots & \ldots & \ldots & \ldots & \ldots & \ldots\\ \end{array}\right) \end{eqnarray*} The Jordan normal form behaves similarly to a diagonal form when a matrix is multiplied by itself many times. The asymptotics are dominated by the eigenvalue of largest magnitude.
References
A classic book on linear algebra is:
Halmos, Paul R.. Finite Dimensional Vector Spaces. Dover Publications, 1958.
A more advanced book which includes the Jordan normal form:
Shilov, George E. Linear Algebra. Dover Publications, 1977.