## Probability

### The Analysis of Data, volume 1

Linear Algebra: Basic Definitions

## C.1. Basic Definitions

This chapter describes some basic concepts in linear algebra, including determinants, eigenvalues and eigenvectors, and the singular value decomposition.
Definition C.1.1. A matrix $A$ is a two dimensional array of real numbers. The elements of $A$ are indexed by their row and column: $A_{ij}$ is the element at the $i$-row and $j$-column. The set of all $n$ by $m$ matrices is denoted $\R^{n\times m}$. The transpose of a matrix $A\in \R^{n\times m}$ is $A^{\top}\in\R^{m\times n}$, defined as $[A^{\top}]_{ij}\defeq A_{ji}$. Vectors are a special case of matrices with either one row or one column.

When indexing vectors we sometimes omit the redundant index. For example, if $\bb v$ is a column vector we write $[\bb v]_{j1}$ as simply $v_j$. We similarly consider $1\times 1$ matrices as scalars and refer to them without redundant indices.

We follow convention and write vectors in bold lowercase, for example $\bb x,\bb y$ and scalars in non-bold lowercase, for example $x,y$. We denote matrices in non-bold uppercase, such as $A, B$. Important exceptions are random variables, denoted by non-bold uppercase, for example $X,Y$ and random vectors, denoted by bold uppercase, for example $\bb X,\bb Y$.

Definition C.1.2. Multiplying a matrix $A$ by a scalar $c\in \R$ is defined by $[cA]_{ij}\defeq cA_{ij}.$ The addition of two matrices $A, B$ of the same size is defined as $[A+B]_{ij}\defeq A_{ij}+B_{ij}.$
Definition C.1.3. A matrix having the same number of rows and columns is called a square matrix. The diagonal elements of a square matrix $A$ are $A_{11},\ldots,A_{nn}$. A diagonal matrix is a square matrix whose off diagonal elements are zero. A diagonal matrix whose diagonal elements are 1 is called the identity matrix and is denoted by $I$. Given a vector of diagonal values $\bb v$ we denote the corresponding diagonal matrix as $\diag(\bb v)$.
Definition C.1.4. An upper triangular matrix is a square matrix for which all elements below the diagonal are zero. A lower triangular matrix is a square matrix for which all elements above the diagonal are zero. A triangular matrix is a lower triangular matrix or an upper triangular matrix.
Definition C.1.5. The product of a matrix $A\in\R^{n\times m}$ and a matrix $B\in\R^{m\times k}$ is the matrix $AB \in \R^{n\times k}$ defined as $[AB]_{ij}=\sum_{r=1}^m A_{ir} B_{rj}.$

Note that while matrix multiplication by a scalar and matrix addition are commutative, matrix product is not: $AB\neq BA$ in general. All three operations are associative: $A+(B+C)=(A+B)+C$, $c(AB)=(cA)B$, and $A(BC)=(AB)C$. This implies we can omit parenthesis and write $A+B+C$, $cAB$, and $ABC$ without any ambiguity.

When we use the matrix product notation $AB$ we assume implicitly that it is defined, implying that $A$ has the same number of columns as $B$ has rows. Repeated multiplications is denoted using an exponent, for example $AA=A^2$ and $AA^k=A^{k+1}$.

Example C.1.1. Using the above definition of matrix multiplication, we see that for all matrices $A$, we have $AI=IA=A$. Similarly, we have $AI^k=I^kA=A$ and $AIA=A^2$.
Example C.1.2. For $A\in\R^{n\times m}$, $\bb x \in \R^{m\times 1}$, and $\bb y \in \R^{1\times n}$, we have \begin{align*} A\bb x &\in \R^{n\times 1},& [A\bb x]_{i}&=\sum_{r=1}^m A_{ir}x_r\\ \bb y A &\in\R^{1\times m},& [\bb y A]_{i}&=\sum_{r=1}^n y_r A_{ri}\\ {\bb y} A \bb x &\in\R^{1\times 1},& {\bb y} A \bb x &= \sum_{i=1}^n\sum_{j=1}^m y_iA_{ij}x_j. \end{align*} If $n=m$, we also have \begin{align*} {\bb x}^{\top} A \bb x &\in\R^{1\times 1} & {\bb x}^{\top} A \bb x &= \sum_{i=1}^n\sum_{j=1}^n x_iA_{ij}x_j. \end{align*}
Definition C.1.6. For two vectors $\bb x\in\R^{n\times 1}, \bb y\in\R^{m\times 1}$ we define the outer product (see also Section B.4) as $\bb x \otimes \bb y \defeq \bb x {\bb y}^{\top} \in\R^{n\times m},$ and if $n=m$ we define the inner product (see also Section B.4) as $\langle \bb x,\bb y\rangle \defeq {\bb x}^{\top}\bb y = {\bb y}^{\top}\bb x\in\R.$
Definition C.1.7. We refer to the vector $\sum_{i=1}^n \alpha_i \bb v^{(n)}$, where $\alpha_1,\ldots,\alpha_n\in\R$, as a linear combination of the vectors ${\bb v}^{(1)},\ldots, {\bb v}^{(n)}$. The $\alpha_i$, $i=1,\ldots,n$ values are called the combination coefficients.
Definition C.1.8. The vectors ${\bb v}^{(1)},\ldots, {\bb v}^{(n)}$ are said to be linearly independent if the expression $\sum_i \alpha_i {\bb v}^{(i)}=0$ (note that the zero on the right hand side is the zero vector) implies that $\alpha_i=0$ for all $i$. If the vectors are not linearly independent they are said to be linearly dependent. The definition implies that if ${\bb v}^{(1)},\ldots, {\bb v}^{(n)}$ are linearly dependent, then ${\bb v}^{(i)}$ may be written as a linear combination of the remaining vectors.
Example C.1.3. The vectors $(0,1)$ and $(0,2)$ are linearly dependent, but the vectors $(0,1)$ and $(1,0)$ are linearly independent. In fact, the set of all linear combinations of $(0,1)$ and $(1,0)$ coincides with $\R^2$ (the set of all possible two dimensional vectors).
Example C.1.4. Generalizing the previous example, the vectors ${\bb e}^{(i)}\in\R^{n\times 1}$ defined by $e^{(i)}_j=1$ if $i=j$ and 0 otherwise, $i=1,\ldots,n$, are linearly independent. Since for every vector $\bb v$, $\bb v = \sum v_i {\bb e}^{(i)},$ the set of all their linear combinations of $\bb e^{(i)}, i=1,\ldots,n$ is $\R^n$ (the set of all possible $n$ dimensional vectors). The vectors $\bb e^{(i)}, i=1,\ldots,n$ are called the standard basis for $\R^n$.

Examining Definition C.1.5 we see that multiplying a matrix by a column vector $A\bb v$ yields a column vector that is a linear combination of the columns of $A$. Similarly the matrix multiplication $AB$ is a matrix whose columns are each a linear combination of the column vectors of $A$. We will use these important observations in several proofs later on.

The proposition below helps to explain why matrices and linear algebra are so important.

Proposition C.1.1. Any linear transformation $T:\R^n\to\R^m$ (Definition B.4.2) is realized by matrix multiplication $T(\bb v) = A\bb v$ for some $A\in\R^{m\times n}$. Conversely, the matrix multiplication mapping $\bb v\mapsto A\bb v$ is a linear transformation.
Proof. Let $T$ be a linear transformation, and denote by ${\bb e}^{(i)}$, $i=1,\ldots,n$ the standard basis from Example C.1.4. Given a vector $\bb v$ we express it as a linear combination of the standard basis $\bb v = \sum_i v_i {\bb e}^{(i)}$, and apply linearity $T(\bb v)=T\left(\sum v_i {\bb e}^{(i)}\right) = \sum v_i {\bb u}^{(i)},$ where ${\bb u}^{(i)}=T({\bb e}^{(i)})$. Setting the columns of $A$ to ${\bb u}^{(i)}$, $i=1,\ldots,m$ shows that $T$ is a matrix multiplication mapping realized by the matrix $A$: $T(\bb v) = A\bb v$. The second statement follows from the fact that for matrix multiplication $A(c_1\bb u + c_2 \bb v)=c_1 A\bb u + c_2 A\bb v$.
Definition C.1.9. The inverse of a square matrix $A$ is the matrix $A^{-1}$ such that $AA^{-1}=A^{-1}A=I$. A matrix $A$ for which an inverse exists is called non-singular. Otherwise it is called singular.
Proposition C.1.2. The inverse of a non-singular matrix is unique.
Proof. If $A$ has two inverses $B$ and $C$, then $AB=I$ and $C=CI=CAB=B$.

One of the applications of matrix inversion is in solving linear systems of equations. Given the system of equations $A\bb x=\bb y$ where $A,\bb y$ are known and $\bb x$ is not known, we can solve for $\bb x$ by inverting the matrix $\bb x=A^{-1}A\bb x=A^{-1}\bb y.$

%This, $A^{-1}\bb y$ is the vector of coefficients of the expansion of $\bb y$ in a linear combination using the columns of $A$.
Proposition C.1.3. For any two matrices $A,B$ we have $(AB)^{\top}=B^{\top}A^{\top}$. Similarly, for any non-singular matrices $A,B$ we have $(AB)^{-1}=B^{-1}A^{-1}$.
Proof. The first statement follows from $[(AB)^{\top}]_{ij}=\sum_k A_{jk}B_{ki} =\sum_k B_{ki} A_{jk} = [B^{\top}A^{\top}]_{ij}$ and the second statement follows from $AB(B^{-1}A^{-1})=I=B^{-1}A^{-1}AB$.
Proposition C.1.4. For any non-singular matrix $A$ we have $(A^{\top})^{-1}=(A^{-1})^{\top}.$ We denote the matrix above as $A^{-\top}$.
Proof. By the previous proposition \begin{align*} A^{\top}(A^{-1})^{\top} &= (A^{-1}A)^{\top}=I^{\top}=I\\ (A^{-1})^{\top}A^{\top} &= (AA^{-1})^{\top}=I^{\top}=I. \end{align*}
Definition C.1.10. An orthogonal matrix $A$ is a square matrix whose columns are orthonormal vectors, or equivalently $A$ satisfies $AA^{\top}=I$.
Proposition C.1.4. Multiplication by an orthogonal matrix preserves inner products and norms \begin{align*} \langle \bb v,\bb w\rangle &= \langle A\bb v,A\bb w\rangle\\ \|A\bb x\|&=\|\bb x\|. \end{align*}
Proof. $\langle A\bb v,A\bb w\rangle = {\bb v}^{\top}A^{\top}A\bb w={\bb v}^{\top}\bb w.$

As a consequence of the above proposition, mapping vectors by multiplying with an orthogonal matrix preserves the angle between the two vectors (see Section B.4). Moreover norm preservation implies that multiplying a vector by an orthogonal matrix preserves the distance between the vector and the origin. We thus interpret the mapping $\bb x\mapsto A\bb x$ for an orthogonal $A$ as rotation or reflection in $n$-dimension.

Definition C.1.11. The Kronecker product of two matrices $A\in\R^{m\times m}$ and $B\in\R^{n\times n}$ is \begin{align*} A\otimes B= \begin{pmatrix} A_{11}B& A_{12}B& \cdots& A_{1m}B\\ A_{21}B& A_{22}B& \cdots& A_{2m}B\\ \vdots& \vdots & \vdots& \vdots\\ A_{m1}B& A_{22}B& \cdots& A_{mm}B \end{pmatrix} \end{align*} where $A_{ij}B$ is the matrix corresponding to the product of a scalar and a matrix.

Note that the Kronecker product is consistent with the earlier definition of an outer product of two vectors $\bb v\otimes \bb w$: $\bb u\otimes \bb v = \bb u {\bb v}^{\top}.$

Example C.1.5. The matrix $I\otimes B$ is the block diagonal matrix \begin{align*} I\otimes B= \begin{pmatrix} B& 0& \cdots& 0\\ 0 & B & \cdots & 0\\ \vdots& \vdots & \vdots& \vdots\\ 0& 0& \cdots& B \end{pmatrix}. \end{align*}