Probability
The Analysis of Data, volume 1
Modes of Convergence
$
\def\P{\mathsf{\sf P}}
\def\E{\mathsf{\sf E}}
\def\Var{\mathsf{\sf Var}}
\def\Cov{\mathsf{\sf Cov}}
\def\std{\mathsf{\sf std}}
\def\Cor{\mathsf{\sf Cor}}
\def\R{\mathbb{R}}
\def\c{\,|\,}
\def\bb{\boldsymbol}
\def\diag{\mathsf{\sf diag}}
\def\defeq{\stackrel{\tiny\text{def}}{=}}
\newcommand{\toop}{\xrightarrow{\scriptsize{\text{p}}}}
\newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}}
\newcommand{\tood}{\rightsquigarrow}
$
8.1. Modes of Stochastic Convergence
We consider in this chapter several important limit theorems. We start by exploring different types of convergences, and then move on to the law of large numbers and the central limit theorem. We emphasize the multivariate case of random vectors with $d>1$, but for the sake of intuition it is useful to keep the univariate case in mind.
We list below the three major types or modes of convergences associated with random vectors.
Definition 8.1.1.
Let $\bb{X}^{(n)}, n\in\mathbb{N}$ be a sequence of random vectors and $\bb{X}$ be a random vector.
- $\bb{X}^{(n)}$ converges in probability to $\bb{X}$, denoted by $\bb{X}^{(n)}\toop \bb{X}$, if
\[\lim_{n\to\infty} \P(\|\bb{X^{(n)}}-\bb{X}\|\geq \epsilon)=0, \qquad \forall \epsilon>0.\]
- $\bb{X}^{(n)}$ converges with probability 1 to $\bb{X}$, denoted by $\bb{X}^{(n)}\tooas \bb{X}$, if
\[\P\left(\lim_{n\to\infty} \|\bb{X}^{(n)}-\bb{X}\|=0\right) = 1.\]
Note that $\lim_{n\to\infty} \|\bb{X}^{(n)}-\bb{X}\|=0$ represent the event
\[\left\{\omega: \lim_{n\to\infty} \|\bb{X}^{(n)}(\omega)-\bb{X}(\omega)\|=0\right\} \subset\Omega.\]
- $\bb{X}^{(n)}$ converges in distribution to $\bb{X}$, denoted by $\bb{X}^{(n)}\tood \bb{X}$, if
\[\lim_{n\to\infty} F_{\bb{X}^{(n)}}(\bb{x})= F_{\bb{X}}(\bb{x}) \quad \text{ for all } \bb{x} \text{ at which } F_{\bb{X}}(\bb{x}) \text{ is continuous.}\]
We make the following comments.
- In the definitions above, the limit RV $\bb X$ may be deterministic, in other words $\bb X=\bb c \in\R^d$ with probability 1. In this case we use notations such as $X^{(n)}\tooas c$ in the one dimensional case or ${\bb X}^{(n)}\tooas \bb c$ in higher dimensions.
- There is a fundamental difference between convergence in distribution and the other two types of convergence. Convergence in distribution merely implies that the distribution of $\bb{X}^{(n)}$ is similar to that of $\bb{X}$ for large $n$. Specifically, it does not say anything about $\bb{X}^{(n)}$ and $\bb{X}$ taking on similar values with high probability. Convergence in probability and convergence with probability 1 imply that for large $n$, the values of $\bb{X}^{(n)}$ and $\bb{X}$ are similar (see the following example).
- The following section shows that convergence with probability one implies convergence in probability, which in turn implies convergence in distribution. The converse is not true in general.
Example 8.1.1.
If ${X}$ and ${X}^{(n)}, n\in\mathbb{N}$ are independent uniform RVs in $[a,b]$, we have ${X}^{(n)}\tood {X}$ since the distribution of all RVs is identical. But we certainly do not have convergence in probability or with probability 1 since the RVs are independent and typically take on substantially different values.