## Probability

### The Analysis of Data, volume 1

Linear Algebra: Singular Value Decomposition

## C.5. Singular Value Decomposition

The singular value decomposition (SVD) generalizes the spectral decomposition for non-symmetric matrices.

Proposition C.5.1 (Singular Value Decomposition). For any matrix $A\in\R^{n\times m}$, we have $A=U\Sigma V^{\top}, \qquad \text{where}$
• $U\in\R^{n\times n}$ is an orthogonal matrix whose columns are the eigenvectors of $AA^{\top}$
• $V\in \R^{m\times m}$ is an orthogonal matrix whose columns are the eigenvectors of $A^{\top}A$
• $\Sigma\in \R^{n\times m}$ is an all zero matrix except for the first $r$ diagonal elements $\sigma_i=\Sigma_{ii}$, $i=1,\ldots,r$ (called singular values) that are the square roots of the eigenvalues of $A^{\top}A$ and of $AA^{\top}$ (these two matrices have the same eigenvalues).
We assume above that the singular values are sorted in descending order and the eigenvectors are sorted according to descending order of their eigenvalues.
Proof. We assume in the proof below (without loss of generality) that $n\geq m$. The case $m > n$ can then be established by transposing the SVD of $A^{\top}$: $A=(A^{\top})^{\top}= (U'\Sigma'V'^{\top})^{\top}=V'(U'\Sigma')^{\top}=V'\Sigma'^{\top}U'^{\top}.$ Note that since $n\geq m$, $\Sigma=\begin{pmatrix}\diag(\sigma_1,\ldots,\sigma_m)\\ \bb 0\end{pmatrix}, \qquad \sigma_i=0 \text{ for } r < i\leq m.$

Assuming $\rank(A)=r$ we have that also $\rank(A^{\top}A)=r$ and the spectral decomposition of $A^{\top}A$ implies $A^{\top}A V=V\diag(\sigma_1^2,\ldots,\sigma_r^2,0,\ldots,0)$ where $\sigma_i^2$ are the eigenvalues of $A^{\top}A$ and the columns of $V$, denoted ${\bb v}^{(i)}$, are the corresponding orthonormal eigenvectors. Defining ${\bb u}^{(i)}=A{\bb v}^{(i)}/\sigma_i$ we note that and \begin{align*} A^{\top} {\bb u}^{(i)}&=A^{\top}A{\bb v}^{(i)}/\sigma_i=\sigma_i {\bb v}^{(i)}\\ AA^{\top}{\bb u}^{(i)}&=\sigma_i A {\bb v}^{(i)}=\sigma_i^2{\bb u}^{(i)}, \end{align*} implying that ${\bb u}^{(i)}$ are eigenvectors of $AA^{\top}$ corresponding to eigenvalues $\sigma_i^2$. Since the eigenvectors ${\bb v}^{(i)}$ are orthonormal, then so are the eigenvectors ${\bb u}^{(i)}$ $({\bb u}^{(i)})^{\top}{\bb u}^{(j)} =\frac{({\bb v}^{(i)})^{\top} A^{\top} A {\bb v}^{(j)}}{\sigma_i^2} = ({\bb v}^{(i)})^{\top} {\bb v}^{(j)} = \begin{cases} 1 & i=j\\0&i\neq j\end{cases}.$

We have thus far a matrix $V$ whose columns are eigenvectors of $A^{\top}A$ with eigenvalues $\sigma_i^2$, and a matrix $U$ whose columns are $r$ eigenvectors of $AA^{\top}$ corresponding to eigenvalues $\sigma_i^2$. We augment the eigenvectors ${\bb u}^{(i)}$, $i=1,\ldots,r$ with orthonormal vectors ${\bb u}^{(i)}$, $i=r+1,\ldots,n$ that span $\nulll(AA^{\top})$ and together ${\bb u}^{(i)}$, $i=1,\ldots,n$ are a full orthonormal set of eigenvectors of $AA^{\top}$ with eigenvalues $\sigma_i^2$ (with $\sigma_i=0$ for $i>r$).

Since $[U^{\top} A V]_{ij}=({\bb u}^{(i)})^{\top} A {\bb v}^{(j)} = \begin{cases}\sigma_j \, ({\bb u}^{(i)})^{\top} {\bb u}^{(j)} & i\leq r\\ 0 & i>r\end{cases}$ we get $U^{\top}AV = \Sigma$ and consequentially the desired decompositions $A= U \Sigma V^{\top}$.

When $A$ has full rank ($r=\min(m,n)$) the singular values $\bb\sigma=(\sigma_1,\ldots,\sigma_r)$ are all strictly positive and $\Sigma=\begin{pmatrix} \diag (\bb \sigma )\\ \bb 0\end{pmatrix}$ or $\Sigma=\begin{pmatrix} \diag (\bb \sigma) & \bb 0\end{pmatrix}$.

Corollary C.5.1. The SVD decomposition of a rank $r$ matrix $A$ may be written as $A=\sum_{i=1}^r \sigma_i {\bb u}^{(i)} \otimes {\bb v}^{(j)}$ where ${\bb u}^{(i)}, {\bb v}^{(j)}$ are the columns of $U$ and $V$ respectively. We thus have a decomposition of $A$ as a sum of $r$ rank-one matrices.
Proof. The statement may be easily proved by writing the SVD with $\Sigma$ as a sum of $\Sigma_i = \diag(0, \ldots, 0, \sigma_i, ,0 \ldots,1)$ matrices.

The following corollary shows that the SVD is almost identical to the spectral decomposition for symmetric matrices.

Corollary C.5.2. If $A$ is a symmetric matrix the singular values are the absolute values of the eigenvalues of $A$: $\sigma_i=|\lambda_i|$ and the columns of $U=V$ are the eigenvectors of $A$. If in addition $A$ is a symmetric positive definite matrix then $U, V, \Sigma$ are square non-singular matrices.
Proof. If $A$ is symmetric then $AA^{\top}=A^{\top}A=A^2$ and $U,V,\Sigma$ are square matrices. The eigenvectors of $A$ are also eigenvectors of $A^2$ with squared corresponding eigenvalues and the singular values are the absolute values of the eigenvalues of $A$. If $A$ is pd then it has full rank and a complete set of non-zero eigenvalues.
Corollary C.5.3. Denoting the vector of singular values by $\bb \sigma$ we have $\|A\|_F=\|\bb\sigma\|.$
Proof. This is a direct consequence of the SVD and the fact that multiplying by an orthogonal matrix preserves the Frobenius norm.

Recall the intuitive description of the spectral decomposition earlier in this chapter. It states that any linear transformation $T:\R^d\to\R^d$ corresponding to a multiplication by a symmetric matrix is equivalent to rotation or reflection, followed by axis scaling followed by an inverse rotation or reflection. The SVD generalizes this interpretation to non-symmetric and even non-square matrices. Any linear transformation $T:R^m\to\R^n$ may be decomposed as a sequence of a rotation or reflection, followed by axis scaling, followed by another rotation or scaling. Geometrically, this implies that the mapping $\bb x\mapsto A\bb x$ transforms a sphere into an ellipse. The orientations of the ellipse axes are provided by the columns of $U, V$ and the elongation of the different ellipse axes are determined by the singular values.