## Probability

### The Analysis of Data, volume 1

Linear Algebra: Eigenvalues, Determinant, and Trace

## C.3. Eigenvalues, Determinant, and Trace

Definition C.3.1. An eigenvector-eigenvalue pair of a square matrix $A$ is a pair of a vector and scalar $(\bb v,\lambda)$ for which $A\bb v=\lambda\bb v$. The spectrum of a matrix $A$ is the set of all its eigenvalues.

We make the following observations.

1. The above definition implies that the eigenvectors and eigenvalues of $A$ are solutions of the vector equation $(A-\lambda I)\bb v=0$.
2. The vector $\bb v$ can only be a solution of $(A-\lambda I)\bb v=0$ if $\dim\nulll(A-\lambda I)\geq 1$, implying that $(A-\lambda I)$ is a singular matrix.
3. If $\bb v$ is an eigenvector of $A$, then so is $c\bb v$ (with the same eigenvalue).
Definition C.3.2. A permutation of order $n$ is a one to one and onto function from $\{1,\ldots,n\}$ to $\{1,\ldots,n\}$. The product of two permutations $\pi\sigma$ is defined to be their function composition $\pi\circ \sigma$. The set of all permutations of order $n$ is denoted $\S_n$.
Definition C.3.3. A pair $a,b\in\{1,\ldots,n\}$ is an inversion of a permutation $\sigma\in\S_n$ if $a < b$ but $\sigma(b) < \sigma(a)$. In other words, the function $\sigma$ reverses the natural order of $a,b$. The number of inversions of a permutation is called its parity. Permutations with an even parity are called even and permutations with an odd parity are called odd. The sign of a permutation $\sigma$, denoted $s(\sigma)$, equals 1 if $\sigma$ is even and -1 if $\sigma$ is odd.
Definition C.3.4. The determinant of a square matrix $A\in \mathbb{R}^{n\times n}$ is defined as $\det A=\sum_{\sigma\in\S_n} \,s(\sigma) \,A_{1,\sigma(1)}A_{2,\sigma(2)}\cdots A_{n,\sigma(n)} .$

The definition above states that the determinant is a sum of many terms, each a product of matrix elements from each row and with differing columns. The sum alternates between adding and subtracting these products, depending on the parity of the permutation.

Example C.3.1. The permutation of a $2\times 2$ matrix $A$ is $\det A=A_{11}A_{22}-A_{12}A_{21}.$ There are precisely two permutations in $\S_2$: the identity $\sigma_1$ ($\sigma_1(i)=i$) and the non-identity $\sigma_2$ ($\sigma_2(1)=2$ and $\sigma_2(2)=1$). The identity permutation has zero inversions and is therefore even. The permutation $\sigma_2$ has one inversion (the pair (1,2) and is therefore odd).
Example C.3.2. The permutation of a $3\times 3$ matrix $A$ is \begin{multline*} \det A = A_{11}A_{22}A_{33}-A_{11}A_{23}A_{32}-A_{12}A_{21}A_{33}\\ +A_{1,2}A_{2,3}A_{3,1} +A_{1,3}A_{2,1}A_{3,2} - A_{1,3}A_{2,2}A_{3,1}. \end{multline*} The first term corresponds to the identity permutation, which is even. The next two terms correspond to permutations with a single inversion ((2,3) in the first case and (1,2) in the second case), making them odd. The next two terms have two inversions making them even ((2,3) and (1,3) in the first case and (1,2) and (1,3) in the second case). The last term has three inversions making it odd ((1,2), (1,3), and (2,3)).
Proposition C.3.1. The determinant of a triangular matrix is the product of its diagonal elements.
Proof. All products in the definition of the determinant zero out except for the single product containing all diagonal elements.

Note that the above proposition applies in particular to diagonal matrices.

Proposition C.3.2. The following properties hold
1. $\det I=1$.
2. $\det A$ is a linear function of the $j$-column vector $\bb v=(A_{1j},\ldots,A_{nj})$ assuming other columns are held fixed.
3. If $A'$ is obtained from $A$ by interchanging two columns then $\det A=-\det A'$.
4. If $A$ has two equal columns then $\det A=0$.
5. $\det (A)=\det( A^{\top})$.
Proof. The first property follows from Proposition C.3.1. To prove the second property we note that multiplying the $j$ column by a $c$ multiplies every product in the sum by $c$ and therefore it multiplies the determinant itself by $c$. Adding a vector ${\bb w}$ to the $j$-column $\bb v$ of $A$ results in a new matrix $A'$ whose determinant has product terms containing a sum $(v_i+w_i)$ that may be expanded into a sum of the two determinants -- one with ${\bb v}$ as the $j$-column and one with ${\bb w}$ as the $j$-column. To prove the third property note that each product term in the sum defining the determinant of $A'$ corresponds to a product term in the sum defining the determinant of $A$ but with the sign inverted (since the parity function changes sign). The fourth property if a corollary of the third property. Examining the definition of the determinant as a sum of products, we see that $\det (A)=\det(A^{\top})$.
Proposition C.3.3. For any two square matrices $A,B$ we have \begin{align*} \det (BA)&=\det B\det A \\ \det (A^{-1})&=(\det A)^{-1}.\end{align*}
Proof. The proof of $\det (BA)=\det B\det A$ follows from the previous proposition applied to $f(A)=\det(BA)$. For a detailed proof of this result see most linear algebra textbook or (Rudin, 1976, Chapter 9). The statement $\det (A^{-1})=(\det A)^{-1}$ follows from $\det (BA)=\det B\det A$ with $B=A^{-1}$ and the fact that $\det I=1$.
Proposition C.3.4. A square matrix is singular if and only if its determinant is zero.
Proof. If $A$ is non-singular $1=\det I= \det A A^{-1}=\det A \det A^{-1}$ so that $\det A$ is non zero. If $A$ is singular then its columns are linearly dependent and there is one column, say ${\bb v}^{(j)}$ that is a linear combination of the other columns ${\bb v}^{(j)}=\sum_{k\neq j} a_k {\bb v}^{(k)}$. Since the determinant of a matrix with two identical columns is zero, we can replace ${\bb v}^{(j)}$ by ${\bb v}^{(k)}$, $k\neq j$ and get a matrix whose determinant is zero. By the second and fourth properties of Proposition C.3.2, replacing ${\bb v}^{(j)}$ by ${\bb v}^{(j)}-\sum_{k\neq j} a_k {\bb v}^{(k)}$ results in a matrix whose determinant is the same as the original matrix. Since doing so results in a determinant of a matrix with a zero column, $\det A=0$.
Definition C.3.5. The characteristic polynomial of a square matrix $A$ is the function $f(\lambda) = \det(\lambda I-A), \qquad \lambda\in\R.$

Substituting the definition of the determinant in the equation above, we see that $f(\lambda)$ is indeed a polynomial function in $\lambda$.

Recall from the previous section that for $(\lambda,\bb v)$ to be eigenvalue-eigenvector of $A$ the matrix $(A-\lambda I)$ must be singular. Combining this with the proposition above, we get that the eigenvalues are the roots of the characteristic polynomial: $f(\lambda)=\det(\lambda I-A)=0.$ This observation leads to a simple procedure for finding the eigenvalues of a given square matrix $A$ by finding the roots of $f(\lambda)$ (either analytically or numerically). Once the eigenvalues $\lambda_1,\ldots,\lambda_k$ are known, we can obtain the eigenvectors by solving the linear equations $(A-\lambda_i I){\bb v}^{(i)}=\bb 0,\qquad i=1,\ldots,k.$

Since the eigenvalues $\lambda_1,\ldots,\lambda_n$ are the roots of the characteristic polynomial $f(\lambda)$, we can write it as the following product \begin{align} f(\lambda) = \det(\lambda I-A) = \prod_i (\lambda-\lambda_i). \end{align} This factorization applies for any polynomial $f(x)=\prod(x-x_i)$ where $x_i$ are the roots. For example $f(x)=x^2-3x+2=(x-1)(x-2)$.

Example C.3.3. The characteristic polynomial corresponding to the matrix $\begin{pmatrix} 1&1\\0&1\end{pmatrix}$ is $0=\det \begin{pmatrix} \lambda-1&-1\\0&\lambda-1\end{pmatrix} = (\lambda-1)^2,$ whose solution over $\lambda\in\R$ is $\lambda=1$. There is thus a single eigenvalue $\lambda=1$. To find the eigenvector or eigenvectors we solve the linear system of equations \begin{align*} \bb 0=\begin{pmatrix} 1-\lambda& 1\\ 0 & 1-\lambda \end{pmatrix} \bb v = \begin{pmatrix} 0 & 1\\ 0 & 0 \end{pmatrix}\bb v \end{align*} whose solution is $\bb v=(a,0)$ for all $a$, for example $\bb v=(1,0)$.
Definition C.3.6. The trace of a square matrix $\trace(A)$ is the sum of its diagonal elements.
Proposition C.3.5. The following properties hold: \begin{align*} \trace(A+B)&=\trace(A)+\trace(B)\\ \trace(AB) &=\trace(BA). \end{align*}
Proof. The first property is trivial. The second follows from \begin{align*} \trace(AB) &= \sum_k \sum_j A_{kj}B_{jk} = \sum_k \sum_j B_{kj}A_{jk} = \trace(BA). \end{align*}
Definition C.3.7. The Frobenius norm of a matrix $A$ is the Euclidean norm of the vector obtained by concatenating the matrix columns into one long vector $\|A\|_F \defeq \sqrt{\sum_i\sum_j A_{ij}^2}.$

Since the diagonal elements of $AA^{\top}$ are the sum of squares of the rows of $A$, and the diagonal elements of $A^{\top}A$ are the sum of squares of the columns of $A$, we have $\|A\|_F = \sqrt{\trace(A^{\top}A)}=\sqrt{\trace(AA^{\top})}.$

Proposition C.3.6. For any matrix $A$ and any orthogonal matrix $U$, $\|UA\|_F=\|A\|_F.$
Proof. $\|UA\|_F = \sqrt{\trace((UA)^{\top}UA)}= \sqrt{\trace(A^{\top}A)}=\|A\|_F.$
Proposition C.3.7. If $A$ is a square matrix with eigenvalues $\lambda_i$, $i=1,\ldots,n$ \begin{align*} \trace A&=\sum_{i=1}^n \lambda_i\\ \det A&=\prod_{i=1}^n \lambda_i. \end{align*}
Proof. Expressing the characteristic polynomial $f(\lambda)$ as a product $\prod_i(\lambda-\lambda_i)$, we get \begin{align*} 0 = f(\lambda) = \prod_{i=1}^n (\lambda-\lambda_i)=\lambda^n - \lambda^{n-1} \sum_{i=1}^n \lambda_i + \cdots + (-1)^n \prod_{i=1}^n \lambda_i. \end{align*} Recall from the second statement of Proposition C.3.2 that multiplying a column by -1 inverts the sign of the determinant. It follows that $\det A=(-1)^n\det(-A)$, and since $f(0)=\det(-A)=(-1)^n \prod \lambda_i$, we have $\det A=\prod_{i=1}^n \lambda_i$.

Substituting the definition of the determinant in $\det(\lambda I-A)$, we see that the only terms of power $\lambda^{n-1}$ result from a multiplication of the diagonal terms $\prod_{i=1}^{n} (\lambda-A_{ii})$. More specifically, there are $n$ terms containing a power $\lambda^{n-1}$ in the determinant expansion: $-A_{11}\lambda^{n-1},\ldots,-A_{11}\lambda^{n-1}$. Collecting these terms, we get that the coefficient associated with $\lambda^{n-1}$ in the characteristic polynomial is $-\trace(A)=\sum_{i=1}^{n} A_{ii}$. Comparing this to the coefficient of $\lambda^{n-1}$ in the equation above we get that $\trace (A)=\sum\lambda_i$.

The proposition below is one of the central results in linear algebra. A proof is available in most linear algebra textbooks.

Proposition C.3.7. (Spectral Decomposition) For a symmetric matrix $A$ we have \begin{align*} U^{\top} A U = \diag(\bb \lambda)\quad \text{or} \qquad A=U \diag(\bb\lambda) U^{\top}, \end{align*} where $\bb \lambda$ is the vector of eigenvalues and $U$ is an orthogonal matrix whose columns are the corresponding eigenvectors.
Proposition C.3.9. A square orthogonal matrix is non-singular and has determinant +1 or -1.
Proof. Examining the definition of the determinant, we see that $\det (A)=\det( A^{\top})$. It then follows that $1=\det(I)=\det(AA^{\top})=\det A\det A^{\top}=(\det A)^2.$
Proposition C.3.10. A square matrix is orthogonal if and only if $A^{-1}=A^{\top}$. Moreover, $A$ is orthogonal if and only if $A^{\top}$ is orthogonal.
Proof. Since an orthogonal matrix is non-singular it has an inverse and since the inverse is unique, it must coincide with $A^{\top}$ and we have $I=A^{-1}A=A^{\top}A$.

Recall that every linear transformation $T$ is realized by a matrix multiplication operation: $T(\bb x)=A\bb x$. If $A$ is orthogonal, the mapping $\bb x\mapsto A\bb x$ may be interpreted as a geometric rotation or reflection around the axis and the mapping $\bb x\mapsto A^{\top}\bb x= A^{-1}\bb x$ is the inverse rotation or reflection. If $A$ is diagonal, the mapping $\bb x\mapsto A\bb x$ may be interpreted as stretching some dimensions and compressing other dimensions. Applying the spectral decomposition to a symmetric $A$, we get a decomposition of $A$ as a product of three matrices $U \diag(\bb\lambda) U^{\top}$. This implies that the linear transformation $T(\bb x)=A\bb x$ can be viewed as a sequence of three linear transformations: the first begin a rotation or reflection, the second being scaling of the dimensions, and the third being another rotation or reflection.

Proposition C.3.11. For a symmetric $A$, $\rank(A)$ equals the number of non-zero eigenvalues.
Proof. By the spectral decomposition we have $\rank(A)=\rank(U^{\top}\diag(\bb\lambda) U)=\rank(\diag(\bb\lambda)).$
Proposition C.3.12. A symmetric $A\in\mathbb{R}^{n\times n}$ has $n$ orthonormal eigenvectors and $\col(A)$ is spanned by the eigenvectors corresponding to non-zero eigenvalues.
Proof. The spectral decomposition $AU^{\top}=UA$ implies that the columns of $U$ are orthonormal eigenvectors. If $\bb v\in\col(A)$ then $\bb v=A\bb x$ for some $\bb x$. Expressing $\bb x$ as a linear combination of the columns ${\bb \alpha}^{(i)}$, $i=1,\ldots,n$ of $U$ we have that $\bb v=A\bb x=A\sum a_i {\bb\alpha}_i=\sum a_i\lambda_i {\bb\alpha}^{(i)}$ where the second sum is over the indices corresponding to non-zero eigenvalues.