Levy's Continuity Theorem and the Cramer-Wold Device

Probability

The Analysis of Data, volume 1

0
- Front Matter
- 0.1: Contents
- 0.2: Preface
1
2
- Random Variables
- 2.1: Basic Definitions
- 2.2: Functions of RVs
- 2.3: Expectation and Variance
- 2.4: Moments and MGF
- 2.5: RVs and Measure Theory
- 2.6: Notes
- 2.7: Exercises
3
4
5
- Important Vectors
- 5.1: Multinomial Vectors
- 5.2: Gaussian Vectors
- 5.3: Dirichlet Vectors
- 5.4: Mixture Vectors
- 5.5: Exponential Family
- 5.6: Notes
- 5.7: Exercises
6
- Random Processes
- 6.1: Basic Definitions
- 6.2: Marginals
- 6.3: Moments
- 6.4: Random Walk
- 6.5: Processes and Measure
- 6.6: Borell-Cantelli and Zero-One
- 6.7: Notes
- 6.8: Exercises
7
- Important RPs
- 7.1: Markov Chains
- 7.2: Poisson Process
- 7.3: Gaussian Process
- 7.4: Notes
- 7.5: Exercises
8
A
- Set Theory
- A.1: Basic Definition
- A.2: Functions
- A.3: Cardinality
- A.4: Limits of Sets
- A.5: Notes
- A.6: Exercises
B
- Metric Spaces
- B.1: Basic Definitions
- B.2: Limits
- B.3: Continuity
- B.4: Euclidean Space
- B.5: Growth of Functions
- B.6: Notes
- B.7: Exercises
C
- Linear Algebra
- C.1: Basic Definitions
- C.2: Rank
- C.3: Eigenvalues and Determinant
- C.4: Semidefinite Matrices
- C.5: SVD
- C.6: Notes
- C.7: Exercises
D
- Differentiation
- D.1: Scalar Differentiation
- D.2: Power and Taylor Series
- D.3: Notes
- D.4: Exercises
E
- Measure Theory
- E.1: Sigma Algebras
- E.2: Measure Function
- E.3: Extension Theorem
- E.4: Independence
- E.5: Important Measures
- E.6: Measurable Functions
- E.7: Notes
F

$ \def\P{\mathsf{\sf P}} \def\E{\mathsf{\sf E}} \def\Var{\mathsf{\sf Var}} \def\Cov{\mathsf{\sf Cov}} \def\std{\mathsf{\sf std}} \def\Cor{\mathsf{\sf Cor}} \def\R{\mathbb{R}} \def\c{\,|\,} \def\bb{\boldsymbol} \def\diag{\mathsf{\sf diag}} \def\defeq{\stackrel{\tiny\text{def}}{=}} \newcommand{\toop}{\xrightarrow{\scriptsize{\text{p}}}} \newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}} \newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}} \newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}} \newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}} \newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}} \newcommand{\tood}{\rightsquigarrow} \newcommand{\iid}{\mbox{$\;\stackrel{\mbox{\tiny iid}}{\sim}\;$}}$

8.8. Levy's Continuity Theorem and the Cramer-Wold Device

Proposition 8.8.1 (Levy's Continuity Theorem). \[ {\bb X}^{(n)}\tood \bb X \qquad \text{if and only if} \qquad \phi_{{\bb X}^{(n)}}(\bb t)\to \phi_{\bb X}(\bb t), \, \forall \bb t\in\R^d. \]

Proof. We assume that ${\bb X}^{(n)}\tood \bb X$. Since $\exp(i \bb t^{\top} \bb X )=\cos \bb t^{\top} \bb X + i \sin \bb t^{\top} \bb X$ we have that $\phi$ is continuous and bounded as a function of $\bb X$, which together with implication $1\Rightarrow 3$ implies the pointwise convergence of the characteristic function.

Conversely, we assume that $\forall \bb t\in\R^d,\, \, \phi_{{\bb X}^{(n)}}(\bb t)\to \phi_{\bb X}(\bb t)$ and show that for any continuous function $g$ that is zero outside a bounded and closed set, we have $\E(g({\bb X}^{(n)}))\to \E(g(\bb X))$. Using the Portmanteau theorem, this implies that ${\bb X}^{(n)}\tood \bb X$. Since $g$ is continuous on a compact set, it is uniformly continuous and we can select for all $\epsilon>0$ a $\delta>0$ such that $\|\bb x-\bb y\| < \delta$ implies $|g(\bb x)-g(\bb y)| < \epsilon$.

Denoting by $\bb Z$ a $N(\bb 0,\sigma^2 I)$ random vector that is independent of $\bb X$ and the sequence ${\bb X}^{(n)}$, we have \begin{align*} & |\E(g({\bb X}^{(n)})) - \E(g(\bb X))| \\ &\quad = |\E(g({\bb X}^{(n)})) - \E(g(\bb X)) + \E(g({\bb X}^{(n)}+\bb Z)) \\ &\quad \quad - \E(g({\bb X}^{(n)}+\bb Z)) + \E(g({\bb X}+\bb Z)) - \E(g({\bb X}+\bb Z)) |\\ &\quad \leq |\E(g({\bb X}^{(n)})) - \E(g({\bb X}^{(n)}+\bb Z)) | + |\E(g({\bb X}^{(n)}+\bb Z)) -\E(g({\bb X}+\bb Z))|\\ &\quad \quad + | \E(g({\bb X}+\bb Z)) - \E(g(\bb X))|. \end{align*} The first term above is bounded by $2\epsilon$ since for $\sigma$ sufficiently small \begin{align*} |\E(g({\bb X}^{(n)})) - \E(g({\bb X}^{(n)}+\bb Z)) | &\leq \E(|g({\bb X}^{(n)})) - \E(g({\bb X}^{(n)}+\bb Z))| I(\|\bb Z\|\leq\delta)\\ & \quad + \E(|g({\bb X}^{(n)})) - \E(g({\bb X}^{(n)}+\bb Z)) | I(\|\bb Z\|>\delta)\\ &\leq \E(\epsilon) + 2 \left(\sup_{\bb w}|g(\bb w)|\right) \P(\|\bb Z\|>\delta)\\ &\leq 2\epsilon. \end{align*} The third term above is also bounded by $2\epsilon$ due to a similar argument. It remains to show that the second term converges to zero: $\E(g({\bb X}^{(n)}+\bb Z))\to \E (g(\bb X+\bb Z))$. We will then have that $|\E(g({\bb X}^{(n)})) - \E(g(\bb X))|\to 0$, implying that $\E(g({\bb X}^{(n)})) \to \E(g(\bb X)$ (for all continuous functions $g$ that are zero outside a bounded and closed set), which together with the Portmanteau theorem implies ${\bb X}^{(n)}\tood \bb X$.

We show below that $\E(g({\bb X}^{(n)}+\bb Z))\to \E (g(\bb X+\bb Z))$. We have \begin{align} \tag{*} \E(g({\bb X}^{(n)}+\bb Z)) &= \frac{1}{(\sqrt{2\pi}\sigma)^d} \iint g(\bb x+\bb z) \exp(-{\bb z}^{\top}\bb z/(2\sigma^2)) \, d\bb z d F_{{\bb X}^{(n)}} \\ &= \frac{1}{(\sqrt{2\pi}\sigma)^d} \iint g(\bb u) \exp(-(\bb u-\bb x)^{\top}(\bb u-\bb x)/(2\sigma^2)) \, d\bb u d F_{{\bb X}^{(n)}} \nonumber\\ &= \frac{1}{(\sqrt{2\pi}\sigma)^d} \iint g(\bb u) \prod_{j=1}^d \exp\left(-\frac{(u_j-x_j)^2}{2\sigma^2}\right) \, d\bb u d F_{{\bb X}^{(n)}}\nonumber \\ &= \frac{1}{(\sqrt{2\pi}\sigma)^d} \iint g(\bb u) \prod_{j=1}^d \frac{\sigma}{\sqrt{2\pi}} \int \exp\left(it_j(u_j-x_j)-\sigma^2t_j^2/2\right) \\ & \qquad \qquad\qquad \qquad\qquad \qquad \qquad \qquad \,dt_j d\bb u d F_{{\bb X}^{(n)}}\nonumber \\ &= \frac{1}{(2\pi)^d} \iiint g(\bb u) \exp\left(i{\bb t}^{\top} (\bb u-\bb x)-\sigma^2 {\bb t}^{\top} \bb t/2 \right) \,d\bb t d\bb u d F_{{\bb X}^{(n)}}\nonumber \\ &= \frac{1}{(2\pi)^d} \iint g(\bb u) \exp\left(i{\bb t}^{\top} \bb u-\sigma^2 {\bb t}^{\top} \bb t/2 \right) \phi_{{\bb X}^{(n)}} (-\bb t) \, d \bb td\bb u, \nonumber \end{align} where $\bb u=\bb x+\bb z$. Note that we used Lemma 8.7.1 in the fourth equality and Proposition 8.7.2 in the last equality.

Since $g$ is continuous and non-zero only on a closed and bounded set, $g(\bb u)$ may be made into a distribution by adding a constant to it and dividing by a constant. This implies that $\E(g({\bb X}^{(n)}+\bb Z))$ may be considered as an expectation over a two random vectors $\bb U$ having density $c(g(\bb u)+b)$ and $\bb T$ have a Gaussian density. The argument of that expectation is the bounded function $\exp\left(i{\bb t}^{\top} \bb u\right) \phi_{{\bb X}^{(n)}} (-\bb t)$, and so by the dominated convergence theorem for random variables (Proposition 8.3.1) \begin{multline*} \frac{1}{(2\pi)^d} \iint g(\bb u) \exp\left(i{\bb t}^{\top} \bb u-\sigma^2 {\bb t}^{\top} \bb t/2 \right) \phi_{{\bb X}^{(n)}} (-\bb t) \, d \bb td\bb u \\ \to \frac{1}{(2\pi)^d} \iint g(\bb u) \exp\left(i{\bb t}^{\top} \bb u-\sigma^2 {\bb t}^{\top} \bb t/2 \right) \phi_{\bb X} (-\bb t) \, d \bb td\bb u. \end{multline*} Repeating the derivation in Equation (*) with $\bb X$ substituting ${\bb X}^{(n)}$ we see that \begin{multline*} \E(g({\bb X}^{(n)}+\bb Z)) = \frac{1}{(2\pi)^d} \iint g(\bb u) \exp\left(i{\bb t}^{\top} \bb u-\sigma^2 {\bb t}^{\top} \bb t/2 \right) \phi_{\bb X} (-\bb t) \, d \bb td\bb u, \end{multline*} implying that $\E(g({\bb X}^{(n)}+\bb Z))\to \E(g({\bb X} +\bb Z))$.

Note that Levy's continuity theorem above is similar to Proposition 2.4.2. The former equates convergence in distribution to convergence of characteristic functions. The latter equates convergence in distribution to convergence of the moment generating functions. An advantage of Levy's theorem is that in many cases the moment generating function does not exist, while the characteristic function always exist.

The following result shows a way to prove multivariate convergence in distribution using a variety of univariate convergence results.

Corollary 8.8.1 (Cramer-Wold Device). If ${\bb t}^{\top}{\bb X}^{(n)}\tood {\bb t}^{\top}\bb X$ for all vectors $\bb t\in\R^d$, then ${\bb X}^{(n)}\tood \bb X$.

Proof. Using the continuity theorem, convergence in distribution occurs if the characteristic functions converge. This occurs since for all $\bb t\in\R^d$, \begin{align*} \phi_{{\bb X}^{(n)}}(\bb t) = \E(\exp(i{\bb t}^{\top} {\bb X}^{(n)})) = \phi_{{\bb t}^{\top} {\bb X}^{(n)}}(1) \to \phi_{{\bb t}^{\top} {\bb X}}(1) = \phi_{\bb X}(\bb t). \end{align*}