Probability
The Analysis of Data, volume 1
Linear Algebra: Basic Definitions
$
\def\P{\mathsf{\sf P}}
\def\E{\mathsf{\sf E}}
\def\Var{\mathsf{\sf Var}}
\def\Cov{\mathsf{\sf Cov}}
\def\std{\mathsf{\sf std}}
\def\Cor{\mathsf{\sf Cor}}
\def\R{\mathbb{R}}
\def\c{\,|\,}
\def\bb{\boldsymbol}
\def\diag{\mathsf{\sf diag}}
\def\defeq{\stackrel{\tiny\text{def}}{=}}
$
C.1. Basic Definitions
This chapter describes some basic concepts in linear algebra, including determinants, eigenvalues and eigenvectors, and the singular value decomposition.
Definition C.1.1.
A matrix $A$ is a two dimensional array of real numbers. The elements of $A$ are indexed by their row and column: $A_{ij}$ is the element at the $i$-row and $j$-column. The set of all $n$ by $m$ matrices is denoted $\R^{n\times m}$. The transpose of a matrix $A\in \R^{n\times m}$ is $A^{\top}\in\R^{m\times n}$, defined as $[A^{\top}]_{ij}\defeq A_{ji}$. Vectors are a special case of matrices with either one row or one column.
When indexing vectors we sometimes omit the redundant index. For example, if $\bb v$ is a column vector we write $[\bb v]_{j1}$ as simply $v_j$. We similarly consider $1\times 1$ matrices as scalars and refer to them without redundant indices.
We follow convention and write vectors in bold lowercase, for example $\bb x,\bb y$ and scalars in non-bold lowercase, for example $x,y$. We denote matrices in non-bold uppercase, such as $A, B$. Important exceptions are random variables, denoted by non-bold uppercase, for example $X,Y$ and random vectors, denoted by bold uppercase, for example $\bb X,\bb Y$.
Definition C.1.2.
Multiplying a matrix $A$ by a scalar $c\in \R$ is defined by \[[cA]_{ij}\defeq cA_{ij}.\] The addition of two matrices $A, B$ of the same size is defined as \[ [A+B]_{ij}\defeq A_{ij}+B_{ij}.\]
Definition C.1.3.
A matrix having the same number of rows and columns is called a square matrix. The diagonal elements of a square matrix $A$ are $A_{11},\ldots,A_{nn}$. A diagonal matrix is a square matrix whose off diagonal elements are zero. A diagonal matrix whose diagonal elements are 1 is called the identity matrix and is denoted by $I$. Given a vector of diagonal values $\bb v$ we denote the corresponding diagonal matrix as $\diag(\bb v)$.
Definition C.1.4.
An upper triangular matrix is a square matrix for which all elements below the diagonal are zero. A lower triangular matrix is a square matrix for which all elements above the diagonal are zero. A triangular matrix is a lower triangular matrix or an upper triangular matrix.
Definition C.1.5.
The product of a matrix $A\in\R^{n\times m}$ and a matrix $B\in\R^{m\times k}$ is the matrix $AB \in \R^{n\times k}$ defined as
\[ [AB]_{ij}=\sum_{r=1}^m A_{ir} B_{rj}. \]
Note that while matrix multiplication by a scalar and matrix addition are commutative, matrix product is not: $AB\neq BA$ in general. All three operations are associative: $A+(B+C)=(A+B)+C$, $c(AB)=(cA)B$, and $A(BC)=(AB)C$. This implies we can omit parenthesis and write $A+B+C$, $cAB$, and $ABC$ without any ambiguity.
When we use the matrix product notation $AB$ we assume implicitly that it is defined, implying that $A$ has the same number of columns as $B$ has rows. Repeated multiplications is denoted using an exponent, for example $AA=A^2$ and $AA^k=A^{k+1}$.
Example C.1.1.
Using the above definition of matrix multiplication, we see that for all matrices $A$, we have $AI=IA=A$. Similarly, we have $AI^k=I^kA=A$ and $AIA=A^2$.
Example C.1.2.
For $A\in\R^{n\times m}$, $\bb x \in \R^{m\times 1}$, and $\bb y \in \R^{1\times n}$, we have
\begin{align*}
A\bb x &\in \R^{n\times 1},& [A\bb x]_{i}&=\sum_{r=1}^m A_{ir}x_r\\
\bb y A &\in\R^{1\times m},& [\bb y A]_{i}&=\sum_{r=1}^n y_r A_{ri}\\
{\bb y} A \bb x &\in\R^{1\times 1},& {\bb y} A \bb x &= \sum_{i=1}^n\sum_{j=1}^m y_iA_{ij}x_j.
\end{align*}
If $n=m$, we also have
\begin{align*}
{\bb x}^{\top} A \bb x &\in\R^{1\times 1} & {\bb x}^{\top} A \bb x &= \sum_{i=1}^n\sum_{j=1}^n x_iA_{ij}x_j.
\end{align*}
Definition C.1.6.
For two vectors $\bb x\in\R^{n\times 1}, \bb y\in\R^{m\times 1}$ we define the outer product (see also Section B.4) as
\[ \bb x \otimes \bb y \defeq \bb x {\bb y}^{\top} \in\R^{n\times m},\]
and if $n=m$ we define the inner product (see also Section B.4) as
\[ \langle \bb x,\bb y\rangle \defeq {\bb x}^{\top}\bb y = {\bb y}^{\top}\bb x\in\R.\]
Definition C.1.7.
We refer to the vector
$\sum_{i=1}^n \alpha_i \bb v^{(n)}$, where $\alpha_1,\ldots,\alpha_n\in\R$,
as a linear combination of the vectors ${\bb v}^{(1)},\ldots, {\bb v}^{(n)}$. The $\alpha_i$, $i=1,\ldots,n$ values are called the combination coefficients.
Definition C.1.8.
The vectors ${\bb v}^{(1)},\ldots, {\bb v}^{(n)}$ are said to be linearly independent if the expression $\sum_i \alpha_i {\bb v}^{(i)}=0$ (note that the zero on the right hand side is the zero vector) implies that $\alpha_i=0$ for all $i$. If the vectors are not linearly independent they are said to be linearly dependent. The definition implies that if ${\bb v}^{(1)},\ldots, {\bb v}^{(n)}$ are linearly dependent, then ${\bb v}^{(i)}$ may be written as a linear combination of the remaining vectors.
Example C.1.3.
The vectors $(0,1)$ and $(0,2)$ are linearly dependent, but the vectors $(0,1)$ and $(1,0)$ are linearly independent. In fact, the set of all linear combinations of $(0,1)$ and $(1,0)$ coincides with $\R^2$ (the set of all possible two dimensional vectors).
Example C.1.4.
Generalizing the previous example, the vectors ${\bb e}^{(i)}\in\R^{n\times 1}$ defined by $e^{(i)}_j=1$ if $i=j$ and 0 otherwise, $i=1,\ldots,n$, are linearly independent. Since for every vector $\bb v$, \[\bb v = \sum v_i {\bb e}^{(i)},\]
the set of all their linear combinations of $\bb e^{(i)}, i=1,\ldots,n$ is $\R^n$ (the set of all possible $n$ dimensional vectors). The vectors $\bb e^{(i)}, i=1,\ldots,n$ are called the standard basis for $\R^n$.
Examining Definition C.1.5 we see that multiplying a matrix by a column vector $A\bb v$ yields a column vector that is a linear combination of the columns of $A$. Similarly the matrix multiplication $AB$ is a matrix whose columns are each a linear combination of the column vectors of $A$. We will use these important observations in several proofs later on.
The proposition below helps to explain why matrices and linear algebra are so important.
Proposition C.1.1.
Any linear transformation $T:\R^n\to\R^m$ (Definition B.4.2) is realized by matrix multiplication $T(\bb v) = A\bb v$ for some $A\in\R^{m\times n}$. Conversely, the matrix multiplication mapping $\bb v\mapsto A\bb v$ is a linear transformation.
Proof.
Let $T$ be a linear transformation, and denote by ${\bb e}^{(i)}$, $i=1,\ldots,n$ the standard basis from Example C.1.4. Given a vector $\bb v$ we express it as a linear combination of the standard basis $\bb v = \sum_i v_i {\bb e}^{(i)}$, and apply linearity
\[ T(\bb v)=T\left(\sum v_i {\bb e}^{(i)}\right) = \sum v_i {\bb u}^{(i)},\]
where ${\bb u}^{(i)}=T({\bb e}^{(i)})$. Setting the columns of $A$ to ${\bb u}^{(i)}$, $i=1,\ldots,m$ shows that $T$ is a matrix multiplication mapping realized by the matrix $A$: $T(\bb v) = A\bb v$. The second statement follows from the fact that for matrix multiplication $A(c_1\bb u + c_2 \bb v)=c_1 A\bb u + c_2 A\bb v$.
Definition C.1.9.
The inverse of a square matrix $A$ is the matrix $A^{-1}$ such that $AA^{-1}=A^{-1}A=I$. A matrix $A$ for which an inverse exists is called non-singular. Otherwise it is called singular.
Proposition C.1.2.
The inverse of a non-singular matrix is unique.
Proof.
If $A$ has two inverses $B$ and $C$, then $AB=I$ and $C=CI=CAB=B$.
One of the applications of matrix inversion is in solving linear systems of equations. Given the system of equations $A\bb x=\bb y$ where $A,\bb y$ are known and $\bb x$ is not known, we can solve for $\bb x$ by inverting the matrix \[\bb x=A^{-1}A\bb x=A^{-1}\bb y.\]
%This, $A^{-1}\bb y$ is the vector of coefficients of the expansion of $\bb y$ in a linear combination using the columns of $A$.
Proposition C.1.3.
For any two matrices $A,B$ we have $(AB)^{\top}=B^{\top}A^{\top}$. Similarly, for any non-singular matrices $A,B$ we have $(AB)^{-1}=B^{-1}A^{-1}$.
Proof.
The first statement follows from
\[[(AB)^{\top}]_{ij}=\sum_k A_{jk}B_{ki} =\sum_k B_{ki} A_{jk} = [B^{\top}A^{\top}]_{ij}\]
and the second statement follows from $AB(B^{-1}A^{-1})=I=B^{-1}A^{-1}AB$.
Proposition C.1.4.
For any non-singular matrix $A$ we have \[(A^{\top})^{-1}=(A^{-1})^{\top}.\]
We denote the matrix above as $A^{-\top}$.
Proof.
By the previous proposition
\begin{align*} A^{\top}(A^{-1})^{\top} &= (A^{-1}A)^{\top}=I^{\top}=I\\
(A^{-1})^{\top}A^{\top} &= (AA^{-1})^{\top}=I^{\top}=I.
\end{align*}
Definition C.1.10.
An orthogonal matrix $A$ is a square matrix whose columns are orthonormal vectors, or equivalently $A$ satisfies $AA^{\top}=I$.
Proposition C.1.4.
Multiplication by an orthogonal matrix preserves inner products and norms
\begin{align*}
\langle \bb v,\bb w\rangle &= \langle A\bb v,A\bb w\rangle\\
\|A\bb x\|&=\|\bb x\|.
\end{align*}
Proof.
\[\langle A\bb v,A\bb w\rangle = {\bb v}^{\top}A^{\top}A\bb w={\bb v}^{\top}\bb w.\]
As a consequence of the above proposition, mapping vectors by multiplying with an orthogonal matrix preserves the angle between the two vectors (see Section B.4). Moreover norm preservation implies that multiplying a vector by an orthogonal matrix preserves the distance between the vector and the origin. We thus interpret the mapping $\bb x\mapsto A\bb x$ for an orthogonal $A$ as rotation or reflection in $n$-dimension.
Definition C.1.11.
The Kronecker product of two matrices $A\in\R^{m\times m}$ and $B\in\R^{n\times n}$ is
\begin{align*}
A\otimes B=
\begin{pmatrix}
A_{11}B& A_{12}B& \cdots& A_{1m}B\\
A_{21}B& A_{22}B& \cdots& A_{2m}B\\
\vdots& \vdots & \vdots& \vdots\\
A_{m1}B& A_{22}B& \cdots& A_{mm}B
\end{pmatrix}
\end{align*}
where $A_{ij}B$ is the matrix corresponding to the product of a scalar and a matrix.
Note that the Kronecker product is consistent with the earlier definition of an outer product of two vectors $\bb v\otimes \bb w$:
\[\bb u\otimes \bb v = \bb u {\bb v}^{\top}.\]
Example C.1.5.
The matrix $I\otimes B$ is the block diagonal matrix
\begin{align*}
I\otimes B=
\begin{pmatrix}
B& 0& \cdots& 0\\
0 & B & \cdots & 0\\
\vdots& \vdots & \vdots& \vdots\\
0& 0& \cdots& B
\end{pmatrix}.
\end{align*}