## Probability

### The Analysis of Data, volume 1

Metric Spaces: The Euclidean Space

## B.4. The Euclidean Space

As described in Chapter A, the Euclidean space $\R^d=R\times\cdots\times\R$ is the set of all ordered $d$-tuples or vectors over the real numbers $\R$ (when $d=1$ we refer to the vectors as scalars). We denote a vector in $\R^d$ when $d>1$ in bold and refer to its scalar components via subscripts. For example the vector $\bb x=(x_1,\ldots,x_d)$ has $d$ scalar components $x_i\in\R$, $i=1,\ldots,d$. Together with the Euclidean distance $d(\bb x,\bb y) = \sqrt{\sum_{i=1}^d (x_i-y_i)^2},$ the Euclidean space is a metric space $(\R,d)$ (we prove later later in this chapter that the Euclidean distance above is a valid distance function).

Definition B.4.1. Multiplication of a vector by a scalar $c\in\R$ and the addition of two vectors of the same dimensionality are defined as \begin{align*} c\bb x &\defeq (c x_1,\ldots,c x_d)\in\R^d\\ \bb x+\bb y &\defeq (x_1+y_1,\ldots,x_d+y_d)\in\R^d. \end{align*}
Example B.4.1. For all vectors $\bb x$, the vector $\bb 0=(0,\ldots,0)$ satisfies $\bb x+\bb 0=\bb x$ and the scalar 1 satisfies $1\bb x=\bb x$.
Definition B.4.2. A function $T:\R^m\to\R^n$ is a linear transformation if $T(\alpha \bb u+\beta \bb v) = \alpha T(\bb u) + \beta T (\bb v)$ for all $\bb u,\bb v\in\R^m$ and all $\alpha,\beta\in\R$.
Example B.4.2. The function $T:\R^d\to\R$, defined by $T(\bb x)=\sum_{i=1}^d c_i x_i$, is a linear transformation.
Definition B.4.3. An inner product is a function $g(\cdot,\cdot):\R^d\times\R^d\to\R$ that satisfies the following for all $\bb u,\bb v, \bb w\in\R^d$ and for all $\alpha,\beta\in\R$.
1. symmetry: $g( \bb v,\bb u) = g(\bb u,\bb v)$
2. bi-linearity: $g(\alpha \bb u+\beta \bb v,\bb w) =\alpha g(\bb u,\bb w) + \beta g( \bb v,\bb w)$
3. positivity: $g(\bb v,\bb v)\geq 0$.
Note that as a consequence of the first two properties above $g(\alpha \bb u+\beta \bb v,\gamma \bb w+\delta \bb z) = \alpha\gamma g(\bb u,\bb w) + \alpha\delta g(\bb u,\bb z)+\beta\gamma g(\bb v,\bb w) + \beta\delta g(\bb v,\bb z).$
Proposition B.4.1 (Cauchy Schwartz Inequality). For any inner product $g$ and for all $\bb x, \bb y\in\R^d$ $g^2(\bb x,\bb y) \leq g(\bb x,\bb x) \, g(\bb y,\bb y).$
Proof. Using the positivity and bi-linearity properties of the inner product, \begin{align*} 0 & \leq g( g( \bb y,\bb y) \bb x-g( \bb x,\bb y) \bb y,g( \bb y,\bb y) \bb x-g( \bb x,\bb y) \bb y)\\ &=g^2(\bb y,\bb y) g(\bb x,\bb x) - 2 g(\bb y,\bb y) g^2( \bb ,\bb y) + g^2( \bb x,\bb y) g(\bb y,\bb y)\\ &=g(\bb y,\bb y)(g(\bb x,\bb x) g(\bb y,\bb y)-g^2(\bb x,\bb y)). \end{align*} Since $g(\bb y,\bb y)\geq 0$, the proposition follows.
Proposition B.4.2 The Euclidean product $\langle \bb x,\bb y\rangle \defeq \sum_{i=1}^d x_iy_i$ is an inner product
Proof. It is easy to verify the three properties above by direct substitution.
Definition B.4.3. A norm is a function $h:\R^d\to \R$ satisfying the following for all $\bb x, \bb y\in\R^d$, and for all $c\in \R$.
1. non-negativity: $h(\bb x)\geq 0$
2. positivity: $h(\bb x)=0$ if and only if $\bb x=\bb 0$
3. homogeneity: $h(c \,\bb x)=|c|\, h(\bb x)$
4. triangle inequality: $h(\bb x+\bb y) \leq h(\bb x) + h(\bb y)$.
Proposition B.4.3 The function $\|\bb x\| = \sqrt{\langle \bb x,\bb x\rangle}=\left(\sum_{i=1}^d x_i^2\right)^{1/2}$ is a norm, commonly known as the Euclidean norm.
Proof. The first three properties are obvious. The fourth property follows from the Cauchy Schwartz inequality applied to the Euclidean inner product \begin{align*} \|\bb x+\bb y\|^2 &= \langle \bb x+\bb y,\bb x+\bb y\rangle \\ &= \|\bb x\|^2 + 2\langle \bb x,\bb y\rangle +\|\bb y\|^2\\ & \leq \|\bb x\|^2 +2 \|\bb x\| \|\bb y\| + \|\bb y\|^2\\ & = (\|\bb x\|+\|\bb y\|)^2. \end{align*}

The Euclidean norm is the most popular norm. The more general $L_p$ norm \begin{align*} \|\bb x\|_p=\left(\sum_{i=1}^d |x_i|^p\right)^{1/p}, \qquad p\geq 1, \end{align*} has the following special cases \begin{align*} \|\bb x\|_2 &=\|\bb x\| \qquad \text{ (the Euclidean norm)}\\ \|\bb x \|_1 &= \sum_{i=1}^d |x_i|\\ \|\bb x\|_{\infty} &= \max\{|x_1|,\ldots,|x_d|\} \qquad \text{ (achieved by letting } p\to\infty). \end{align*}

The $L_p$ norm can be further generalized as follows.

Definition B.4.5. Assuming $W$ is a non-singular matrix (matrices, and the matrix product notation $W\bb x$, are defined at the beginning of Chapter C), the weighted $L_{p,W}$ norm is $\|\bb x\|_{p,W}= \|W \bb x\|_p.$

Weighted norms are convenient for emphasizing some dimensions over others. For example, if $\diag(\bb w)$ is an all zero matrix, except for its diagonal $\diag(\bb w) = \begin{pmatrix} w_1 &0 &\cdots& 0\\ 0 & w_2&\cdots&0\\ \vdots & \vdots & \ddots &\vdots\\ 0 & \cdots & 0& w_d \end{pmatrix},$ the corresponding weighted $L_2$ norm is $\|\bb x\|_{2,\diag(\bb w)} = \sqrt{\sum_{i=1}^d w_i^2 x_i^2}.$

Definition B.4.6. The angle between $\bb x,\bb y\in\R^d$ is the value $\theta$ for which $\cos\theta=\frac{\langle \bb x,\bb y\rangle}{\|\bb x\| \,\cdot\, \|\bb y\|}.$
Definition B.4.7. Vectors ${\bb x}^{(i)}, i=1,\ldots,k$ are said to be orthogonal if $i\neq j\qquad \text{implies} \qquad \langle {\bb x}^{(i)},{\bb x}^{(j)}\rangle=0.$ If, in addition, $\|{\bb x}^{(i)}\|=1$, $i=1,\ldots,k$, the vectors are said to be orthonormal.
Note that the orthogonality condition above corresponds to a a 90 degree angle (perpendicularity) between every two distinct vectors.
Proposition B.4.4. For any norm $h(\cdot)$, the function $d(\bb x,\bb y)=h(\bb x-\bb y)$ is a valid distance function. Accordingly, we may interpret the norm $h(\bb x)$ as the corresponding distance of $\bb x$ from the origin $d(\bb x,\bb 0)$.
Proof. It is straightforward to verify that $d(\bb x,\bb y)=h(\bb x-\bb y)$ satisfies the first three properties of a distance function. The fourth follows from applying the triangle inequality property of norms to $\bb x-\bb y$ and $\bb y-\bb z$: $h(\bb x-\bb z) \leq h(\bb x-\bb y) + h(\bb y-\bb z).$

The propositions above confirm that the Euclidean space $(\R^d,d)$, where $d(\bb x,\bb y) = \sqrt{\sum_{i=1}^d (x_i-y_i)^2},$ is a metric space.

Figure B.4.1 below shows contour lines of four $L_p$ norms in the left column and four $L_{p,W}$ norms in the right column, where $W=\diag(2,1)$. A non-diagonal matrix $W$ would result in rotated versions of the figures in the right column.

The R code below generates the left and right columns of the figure.

s = seq(-1, 1, length.out = 50)
R = expand.grid(x1 = s, x2 = s)
# generate left column (Lp norms)
D = rbind(R, R, R, R)
D$Norm[1:2500] = abs(D$x1[1:2500]) + abs(D$x2[1:2500]) D$Norm[2501:5000] = ((abs(D$x1[2501:5000]))^1.5 + (abs(D$x2[2501:5000]))^1.5)^(2/3)
D$Norm[5001:7500] = ((abs(D$x1[5001:7500]))^2 + (abs(D$x2[5001:7500]))^2)^0.5 D$Norm[7501:10000] = pmax(abs(D$x1[7501:10000]), abs(D$x2[7501:10000]))
D$p = c(rep("Lp norm, p=1", 2500), rep("Lp norm, p=1.5", 2500), rep("Lp norm, p=2", 2500), rep("Lp norm, p = inf", 2500)) ggplot(D, aes(x1, x2, z = Norm)) + facet_grid(p ~ .) + stat_contour(bins = 4)  # generate right column (weighted Lp norms) D = rbind(R, R, R, R) D$Norm[1:2500] = abs(2 * D$x1[1:2500]) + abs(D$x2[1:2500])
D$Norm[2501:5000] = ((2 * abs(D$x1[2501:5000]))^1.5 +
(abs(D$x2[2501:5000]))^1.5)^(2/3) D$Norm[5001:7500] = ((2 * abs(D$x1[5001:7500]))^2 + (abs(D$x2[5001:7500]))^2)^(1/2)
D$Norm[7501:10000] = pmax(abs(2 * D$x1[7501:10000]),
abs(D$x2[7501:10000])) D$p = c(rep("weighted Lp norm, p=1", 2500), rep("weighted Lp norm, p=1.5",
2500), rep("weighted Lp norm, p=2", 2500), rep("weighted Lp norm, p=inf",
2500))
ggplot(D, aes(x1, x2, z = Norm)) + facet_grid(p ~ .) +
stat_contour(bins = 4)  Figure B.4.1: Equal height contours of the $L_p$ norm (left column) and weighted $L_{p,W}$ norm with $W=\diag(2,1)$ (right column), in the two dimensional case $d=2$. Each row corresponds to a different value of $p$. As $p$ increases from 1 to $\infty$ the contours change their shape from diamond-shape to square-shape.

Example B.4.3. Consider the set $\R^{\infty}$ defined in Example A.3.2. The proofs of the propositions above can be extended to show that $\langle \bb x,\bb y\rangle=\sum_{n\in\mathbb{N}} x_n y_n$ is an inner product on $\R^{\infty}$, $\|\bb x\|_2=\sqrt{\sum_{n\in\mathbb{N}} x_n^2}$ is a norm on $\R^{\infty}$ and $d(\bb x,\bb y)=\sqrt{\sum_{n\in\mathbb{N}} (x_n-y_n)^2}$ is a distance on $\R^{\infty}$, all provided that the infinite sums converge. In general, however, the infinite sums may not converge, making the generalizations above inappropriate.
Example B.4.4. Consider the set $\R^{\infty}$ defined in Example A.3.2 and the distance function \begin{align} \bar d(\bb x,\bb y) = \sup\left\{ \frac{\min(|x_n-y_n|,1)}{n}\,: \, n\in\mathbb{N}\right\}. \end{align} Note that $\bar d$ is well-defined and finite, is symmetric, is non-negative, and is zero if and only if $\bb x=\bb y$. It also satisfies the triangle inequality \begin{align*} \bar d(\bb x,\bb z) &= \sup\left\{ \frac{\min(|x_n-y_n+y_n-z_n|,1)}{n}\,:\, n\in\mathbb{N}\right\} \\ &\leq \sup\left\{ \frac{\min(|x_n-y_n| + |y_n-z_n|,1)}{n}\,:\, n\in\mathbb{N}\right\}\\ &\leq \sup\left\{ \frac{\min(|x_n-y_n|,1)}{n} + \frac{\min(|y_n-z_n|,1)}{n}\,:\,n\in\mathbb{N} \right\}\\ &\leq \bar d(\bb x,\bb y)+\bar d(\bb y,\bb z) \end{align*} and is therefore a distance function.

If $\bb x\in B_{\epsilon}(\bb 0)$ for a given $\epsilon$ then for all $n\in\mathbb{N}$ we have $\min(|x_n|,1) < n\epsilon$. There exists some $N$ such that $n>N$ corresponds to $n\epsilon > 1$, implying that the components $x_{N+1}, x_{N+2},\ldots$ of vectors $\bb x\in B_{\epsilon}(\bb 0)$ are unrestricted. Similarly, for $\bb x\in B_{\epsilon}(\bb 0)$ the components $x_n$, where $n < N$, satisfy $|x_n| < n\epsilon$. It follows that points in $B_{\epsilon}(\bb 0)$ are an intersection of a finite number of sets of the form \begin{align} \tag{*} \R\times \cdots \times \R\times (a,b) \times \R\times\cdots. \end{align} (we refer to sets of the form (*) as simple cylinders.) Since $\bar d(\bb x+\bb c,\bb z+\bb c)=\bar d(\bb x,\bb z)$, we also have that $B_{\epsilon}(\bb y)$ is an intersection of a finite number of simple cylinders or sets of the form expressed in (*).

On the other hand, let $A$ be an intersection of a finite number of simple cylinders. Then for each $\bb x\in A$ we can construct $B_{\epsilon}(\bb y)$ such that $\bb x\in B_{\epsilon}(\bb y)\subset A$ (taking $\epsilon$ to be sufficiently small). This implies that $A$ is a union of open balls and is therefore an open set. A union of intersections of a finite number of simple cylinders is a union of open sets and therefore is open also.

In summary, we have thus demonstrated that in the space $(\R^{\infty},\bar d)$, the set of open sets is equivalent to the set of unions of intersections of a finite number of simple cylinders.

The convergence problems mentioned in Example~B.4.3 leads to the common practice of defining the metric structure on $\R^{\infty}$ using the distance function $\bar d$ in Example B.4.4 rather than the Euclidean distance. In fact, whenever we refer to the metric structure of $\R^{\infty}$ we will assume the metric structure of $\bar d$ derived above. This metric structure is commonly referred to in the literature as the product topology of $\R^{\infty}$.

The metric structure of the Euclidean space simplifies some of the properties described in the previous chapter.

Proposition B.4.5. The Euclidean space $\R^d$ is second countable, and in particular one choice for $\mathcal{G}$ in Definition B.1.5 is the set of all open balls with rational centers and radii. Similarly, the space $(\R^{\infty},\bar d)$ (see Example B.4.4) is second countable.
Proof. We define $\mathcal{G}$ to be the set of all open balls with rational centers and rational radii. Since $\mathbb{Q}$ and $\mathbb{Q}^d$ are countably infinite sets, the set $\mathcal{G}$ is countably infinite.

Let $G$ be an open set and ${\bb x}\in G$. By Proposition B.1.2 there exists $r>0$ for which $B_r({\bb x})\subset G$. Since for every real number there is a rational number that is arbitrarily close, we can select $B_{r'}({\bb x}')$ where $r',{\bb x}'$ are rationals such that ${\bb x}\in B_{r'}({\bb x}') \subset B_r({\bb x})\subset G$. Repeating this for every ${\bb x}\in G$ and taking the union of the resulting rational balls completes the proof.

In the case of $R^{\infty}$, second countability is demonstrated by taking all simple cylinders (see Example B.4.4) whose base $(a,b)$ has rational endpoints and noting that a countable union of sets that are countably infinite is countably infinite.

Proposition B.4.6. In the Euclidean space ${\bb x}^{(n)}\to \bb x$ if and only if ${ x}^{(n)}_j\to x_j$ for all $j=1,\ldots, d$.
Proof. We recall Proposition B.3.2, which states that continuous functions such as $f(x)=x^2$, $f(x)=\sqrt{x}$, and $f(x,y)=x+y$ preserve limits. It follows that $\|{\bb v}^{(n)}\|\to 0$ if and only if $\|{\bb v}^{(n)}\|^2=\sum_{j=1}^d (v_j^{(n)})^2 \to 0$, which occurs if and only if $(v_j^{(n)})^2\to 0$ for all $j=1,\ldots,d$. This occurs if and only if $v_j^{(n)} \to 0$ for all $j=1,\ldots,d$
Example B.4.5. The sequence of vectors $(2^{-n},3^{-n}), n\in\mathbb{N}$ converges as $n\to\infty$ to $\bb 0=(0,0)$.
Proposition B.4.7. The function $\bb f:\R^d\to\R^k$ defined by $\bb f(\bb x)=(f_1(\bb x),\ldots,f_k(\bb x))$ is continuous if and only if the functions $f_1,\ldots,f_k$ are continuous.

Note that above we are denoting the function using bold-face $\bb f$ to indicate that $\bb f(\bb x)$ is a vector.

Proof. We assume that $f_i$ are continuous for all $i=1,\ldots,k$ and consider an arbitrary $\epsilon>0$. Then for $\epsilon'=\epsilon/\sqrt{k}>0$ there exists $\delta>0$ such that if $\|\bb x-\bb y\|^2<\delta^2$ then $|f_i(\bb x)-f_i(\bb y)|^2<\epsilon^2/k$, $i=1,\ldots,k$. Note that we choose the same $\delta$ for all $i=1,\ldots,k$; for example, by setting $\delta=\min(\delta_1,\ldots,\delta_k)$. It follows that if $\|\bb x-\bb y\|^2<\delta$ then $\|\bb f(\bb x)-\bb f(\bb y)\|^2=\sum_{j=1}^k |f_j(\bb x)-f_j(y)|^2\leq k\epsilon^2/k = \epsilon^2,$ showing that $\bb f$ is continuous.

Conversely, if $\bb f$ is continuous, for all $\epsilon>0$ there exists $\delta>0$ such that whenever $\|\bb x-\bb y\|^2\leq \delta^2$ we have $\|\bb f(\bb x)-\bb f(\bb y)\|^2=\sum_{j=1}^k |f_j(\bb x)-f_j(\bb y)|^2 < \epsilon^2.$ This implies $|f_j(\bb x)-f_j(\bb y)|^2 < \epsilon^2$ and $|f_j(\bb x)-f_j(\bb y)| < \epsilon$, implying the continuity of $f_1,\ldots,f_k$.

Proposition B.4.8. The following functions $f:\R^d\to\R^k$ are continuous: \begin{align*} f(\bb x)&=cx_i\\ f(\bb x,\bb y)&= \bb x+\bb y\\ f(\bb x,\bb y)&=\langle \bb x,\bb y\rangle\\ f(x)&=1/x. \end{align*}

Note that in the first and third cases above, we have $k=1$, and in the last case above, we have $k=d=1$.

Proof. To prove the first assertion, let $\epsilon>0$ be given, and define $\delta=\epsilon/|c|$. Whenever $\|\bb x-\bb y\|^2=\sum_{j=1}^d|x_j-y_j|^2 <\epsilon^2/|c|^2$ we also have $|cx_i-cy_i|\leq |c| \cdot |x_i-y_i|<|c|\epsilon/|c|=\epsilon.$

To prove the second assertion note that whenever $\|(\bb u,\bb w)-(\bb x,\bb y)\|<\epsilon/\sqrt{2d}$, then for all $j=1,\ldots,d$, $|x_j-u_j|< \epsilon/\sqrt{2d}$ and $|y_j-w_j|<\epsilon/\sqrt{2d}$. Using the triangle inequality property of the Euclidean norm, we have \begin{align*} \|(\bb x+\bb y)-(\bb u+\bb w)\| &= \|(\bb x-\bb u)+(\bb y-\bb w)\| \\ &\leq \|\bb x-\bb u\|+\|\bb y-\bb w\| \\ &\leq \sqrt{2d\epsilon^2/(2d)}=\epsilon. \end{align*} The proofs of the other propositions are similar.

Corollary B.4.1. All polynomials are continuous functions.

The concept of compactness (Definition B.1.6) has some important consequences (Propositions B.3.5 and B.3.6). The general definition of a compact space (Definition B.1.6) is hard to verify, but the following simple condition is veru useful for verifying compactness in $\R^d$. A proof is available in (Rudin, 1976).

Proposition B.4.8. If a set $A\subset \R^d$ is closed and bounded, it is compact.