Probability
The Analysis of Data, volume 1
Scheffe's Theorem
$
\def\P{\mathsf{\sf P}}
\def\E{\mathsf{\sf E}}
\def\Var{\mathsf{\sf Var}}
\def\Cov{\mathsf{\sf Cov}}
\def\std{\mathsf{\sf std}}
\def\Cor{\mathsf{\sf Cor}}
\def\R{\mathbb{R}}
\def\c{\,|\,}
\def\bb{\boldsymbol}
\def\diag{\mathsf{\sf diag}}
\def\defeq{\stackrel{\tiny\text{def}}{=}}
\newcommand{\toop}{\xrightarrow{\scriptsize{\text{p}}}}
\newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}}
\newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}}
\newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}}
\newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}}
\newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}}
\newcommand{\tood}{\rightsquigarrow}
\newcommand{\iid}{\mbox{$\;\stackrel{\mbox{\tiny iid}}{\sim}\;$}}$
8.4. Scheffe's Theorem
Proposition 8.4.1(Scheffe's Theorem).
The following two statements hold.
(a) If $0\leq {\bb X}^{(n)} \tooas \bb X$ and $\E({\bb X}^{(n)})\to \E(\bb X) < \infty$, then
\[\E(\|{\bb X}^{(n)}-\bb X\|)\to 0.\]
(b) If $f_{{\bb X}^{(n)}}(\bb x)\to f_{\bb X}(\bb x)$ for all $\bb x$, then
\begin{align*}
\int |f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x) |\,d\bb x & \to 0.
\end{align*}
Proof.
We start with proving part (a) in the case of one dimension. Note that for any real number $c$ we have $|c|=c+2\max(-c,0)$ (this can be verified independently for a positive and a negative $c$). It follows that \[\E(|X^{(n)}-X|) =\E(X^{(n)}-X) + 2\E(\max(X-X^{(n)},0)).\]
The first term converges to 0 since $X^{(n)}\tooas X$. The second term converges to 0 by the dominated convergence theorem for random variables applied to the sequence of RVs $\max(X^{(n)}-X)$ (note that $0\leq \max(X-X^{(n)},0) \leq \max(X,0)$ and $\E(\max(X,0)) < \infty$). The proof for the multivariate case follows by applying the triangle inequality
$\|{\bb X}^{(n)}-\bb X\|\leq \sum_{i=1}^d |X^{(n)}_i-X_i|$
and then applying the one dimensional case for each term separately.
The proof of part (b) is similar. We have
\begin{align*}
|f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x)| &= f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x)
+2\max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)\\
\int |f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x)|\, d\bb x &=
\int f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x)\, d\bb x \\
&\quad + 2\int \max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)\, d\bb x.
\end{align*}
The first integral in the right hand side above is 0 since both densities integrate to 1. The second integral converges to 0 by the dominated convergence theorem applied to $\max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)$ (note that $\max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)\leq \max(f_{{\bb X}}(\bb x),0)$) and the Lebesgue measure.
Corollary 8.4.1.
If $\lim_{n\to\infty} f_{{\bb X}^{(n)}}(\bb x)= f_{\bb X}(\bb x)$ for all $\bb x$, then
\[\lim_{n\to\infty} \sup_A | \P({\bb X}^{(n)}\in A) - \P(\bb X\in A)| = 0\]
where the supremum ranges over all measurable sets.
Proof.
\begin{align*}
\sup_A| \P({\bb X}^{(n)}\in A) - \P(\bb X\in A)| &=
\sup_A \Big| \int_A (f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x))\, d\bb x \Big|\\
&\leq \int_A |f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x)| \, d\bb x \\
&=\int |f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x)| \, d\bb x \to 0,
\end{align*}
where the convergence follows from the Scheffe's Theorem.
The above corollary shows that pointwise convergence of the pdfs is stronger than convergence in distribution: the former implies $\sup_A| \P({\bb X}^{(n)}\in A) - \P(\bb X\in A)| \to 0$, while the latter corresponds to convergence $\P({\bb X}^{(n)}\in A)\to \P(\bb X\in A)$ only for some sets of the form $A=(-\infty,a]$ (for some values $a$).