Modes of Convergence

Probability

The Analysis of Data, volume 1

0
- Front Matter
- 0.1: Contents
- 0.2: Preface
1
2
- Random Variables
- 2.1: Basic Definitions
- 2.2: Functions of RVs
- 2.3: Expectation and Variance
- 2.4: Moments and MGF
- 2.5: RVs and Measure Theory
- 2.6: Notes
- 2.7: Exercises
3
4
5
- Important Vectors
- 5.1: Multinomial Vectors
- 5.2: Gaussian Vectors
- 5.3: Dirichlet Vectors
- 5.4: Mixture Vectors
- 5.5: Exponential Family
- 5.6: Notes
- 5.7: Exercises
6
- Random Processes
- 6.1: Basic Definitions
- 6.2: Marginals
- 6.3: Moments
- 6.4: Random Walk
- 6.5: Processes and Measure
- 6.6: Borell-Cantelli and Zero-One
- 6.7: Notes
- 6.8: Exercises
7
- Important RPs
- 7.1: Markov Chains
- 7.2: Poisson Process
- 7.3: Gaussian Process
- 7.4: Notes
- 7.5: Exercises
8
A
- Set Theory
- A.1: Basic Definition
- A.2: Functions
- A.3: Cardinality
- A.4: Limits of Sets
- A.5: Notes
- A.6: Exercises
B
- Metric Spaces
- B.1: Basic Definitions
- B.2: Limits
- B.3: Continuity
- B.4: Euclidean Space
- B.5: Growth of Functions
- B.6: Notes
- B.7: Exercises
C
- Linear Algebra
- C.1: Basic Definitions
- C.2: Rank
- C.3: Eigenvalues and Determinant
- C.4: Semidefinite Matrices
- C.5: SVD
- C.6: Notes
- C.7: Exercises
D
- Differentiation
- D.1: Scalar Differentiation
- D.2: Power and Taylor Series
- D.3: Notes
- D.4: Exercises
E
- Measure Theory
- E.1: Sigma Algebras
- E.2: Measure Function
- E.3: Extension Theorem
- E.4: Independence
- E.5: Important Measures
- E.6: Measurable Functions
- E.7: Notes
F

$ \def\P{\mathsf{\sf P}} \def\E{\mathsf{\sf E}} \def\Var{\mathsf{\sf Var}} \def\Cov{\mathsf{\sf Cov}} \def\std{\mathsf{\sf std}} \def\Cor{\mathsf{\sf Cor}} \def\R{\mathbb{R}} \def\c{\,|\,} \def\bb{\boldsymbol} \def\diag{\mathsf{\sf diag}} \def\defeq{\stackrel{\tiny\text{def}}{=}} \newcommand{\toop}{\xrightarrow{\scriptsize{\text{p}}}} \newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}} \newcommand{\tood}{\rightsquigarrow} $

8.1. Modes of Stochastic Convergence

We consider in this chapter several important limit theorems. We start by exploring different types of convergences, and then move on to the law of large numbers and the central limit theorem. We emphasize the multivariate case of random vectors with $d>1$, but for the sake of intuition it is useful to keep the univariate case in mind.

We list below the three major types or modes of convergences associated with random vectors.

Definition 8.1.1. Let $\bb{X}^{(n)}, n\in\mathbb{N}$ be a sequence of random vectors and $\bb{X}$ be a random vector.

$\bb{X}^{(n)}$ converges in probability to $\bb{X}$, denoted by $\bb{X}^{(n)}\toop \bb{X}$, if \[\lim_{n\to\infty} \P(\|\bb{X^{(n)}}-\bb{X}\|\geq \epsilon)=0, \qquad \forall \epsilon>0.\]
$\bb{X}^{(n)}$ converges with probability 1 to $\bb{X}$, denoted by $\bb{X}^{(n)}\tooas \bb{X}$, if \[\P\left(\lim_{n\to\infty} \|\bb{X}^{(n)}-\bb{X}\|=0\right) = 1.\] Note that $\lim_{n\to\infty} \|\bb{X}^{(n)}-\bb{X}\|=0$ represent the event \[\left\{\omega: \lim_{n\to\infty} \|\bb{X}^{(n)}(\omega)-\bb{X}(\omega)\|=0\right\} \subset\Omega.\]
$\bb{X}^{(n)}$ converges in distribution to $\bb{X}$, denoted by $\bb{X}^{(n)}\tood \bb{X}$, if \[\lim_{n\to\infty} F_{\bb{X}^{(n)}}(\bb{x})= F_{\bb{X}}(\bb{x}) \quad \text{ for all } \bb{x} \text{ at which } F_{\bb{X}}(\bb{x}) \text{ is continuous.}\]

We make the following comments.

In the definitions above, the limit RV $\bb X$ may be deterministic, in other words $\bb X=\bb c \in\R^d$ with probability 1. In this case we use notations such as $X^{(n)}\tooas c$ in the one dimensional case or ${\bb X}^{(n)}\tooas \bb c$ in higher dimensions.
There is a fundamental difference between convergence in distribution and the other two types of convergence. Convergence in distribution merely implies that the distribution of $\bb{X}^{(n)}$ is similar to that of $\bb{X}$ for large $n$. Specifically, it does not say anything about $\bb{X}^{(n)}$ and $\bb{X}$ taking on similar values with high probability. Convergence in probability and convergence with probability 1 imply that for large $n$, the values of $\bb{X}^{(n)}$ and $\bb{X}$ are similar (see the following example).
The following section shows that convergence with probability one implies convergence in probability, which in turn implies convergence in distribution. The converse is not true in general.

Example 8.1.1. If ${X}$ and ${X}^{(n)}, n\in\mathbb{N}$ are independent uniform RVs in $[a,b]$, we have ${X}^{(n)}\tood {X}$ since the distribution of all RVs is identical. But we certainly do not have convergence in probability or with probability 1 since the RVs are independent and typically take on substantially different values.