Probability

The Analysis of Data, volume 1

Scheffe's Theorem

8.4. Scheffe's Theorem

Proposition 8.4.1(Scheffe's Theorem). The following two statements hold.
(a) If $0\leq {\bb X}^{(n)} \tooas \bb X$ and $\E({\bb X}^{(n)})\to \E(\bb X) < \infty$, then \[\E(\|{\bb X}^{(n)}-\bb X\|)\to 0.\] (b) If $f_{{\bb X}^{(n)}}(\bb x)\to f_{\bb X}(\bb x)$ for all $\bb x$, then \begin{align*} \int |f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x) |\,d\bb x & \to 0. \end{align*}
Proof. We start with proving part (a) in the case of one dimension. Note that for any real number $c$ we have $|c|=c+2\max(-c,0)$ (this can be verified independently for a positive and a negative $c$). It follows that \[\E(|X^{(n)}-X|) =\E(X^{(n)}-X) + 2\E(\max(X-X^{(n)},0)).\] The first term converges to 0 since $X^{(n)}\tooas X$. The second term converges to 0 by the dominated convergence theorem for random variables applied to the sequence of RVs $\max(X^{(n)}-X)$ (note that $0\leq \max(X-X^{(n)},0) \leq \max(X,0)$ and $\E(\max(X,0)) < \infty$). The proof for the multivariate case follows by applying the triangle inequality $\|{\bb X}^{(n)}-\bb X\|\leq \sum_{i=1}^d |X^{(n)}_i-X_i|$ and then applying the one dimensional case for each term separately.

The proof of part (b) is similar. We have \begin{align*} |f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x)| &= f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x) +2\max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)\\ \int |f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x)|\, d\bb x &= \int f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x)\, d\bb x \\ &\quad + 2\int \max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)\, d\bb x. \end{align*} The first integral in the right hand side above is 0 since both densities integrate to 1. The second integral converges to 0 by the dominated convergence theorem applied to $\max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)$ (note that $\max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)\leq \max(f_{{\bb X}}(\bb x),0)$) and the Lebesgue measure.

Corollary 8.4.1. If $\lim_{n\to\infty} f_{{\bb X}^{(n)}}(\bb x)= f_{\bb X}(\bb x)$ for all $\bb x$, then \[\lim_{n\to\infty} \sup_A | \P({\bb X}^{(n)}\in A) - \P(\bb X\in A)| = 0\] where the supremum ranges over all measurable sets.
Proof. \begin{align*} \sup_A| \P({\bb X}^{(n)}\in A) - \P(\bb X\in A)| &= \sup_A \Big| \int_A (f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x))\, d\bb x \Big|\\ &\leq \int_A |f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x)| \, d\bb x \\ &=\int |f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x)| \, d\bb x \to 0, \end{align*} where the convergence follows from the Scheffe's Theorem.

The above corollary shows that pointwise convergence of the pdfs is stronger than convergence in distribution: the former implies $\sup_A| \P({\bb X}^{(n)}\in A) - \P(\bb X\in A)| \to 0$, while the latter corresponds to convergence $\P({\bb X}^{(n)}\in A)\to \P(\bb X\in A)$ only for some sets of the form $A=(-\infty,a]$ (for some values $a$).