Scheffe's Theorem

Probability

The Analysis of Data, volume 1

0
- Front Matter
- 0.1: Contents
- 0.2: Preface
1
2
- Random Variables
- 2.1: Basic Definitions
- 2.2: Functions of RVs
- 2.3: Expectation and Variance
- 2.4: Moments and MGF
- 2.5: RVs and Measure Theory
- 2.6: Notes
- 2.7: Exercises
3
4
5
- Important Vectors
- 5.1: Multinomial Vectors
- 5.2: Gaussian Vectors
- 5.3: Dirichlet Vectors
- 5.4: Mixture Vectors
- 5.5: Exponential Family
- 5.6: Notes
- 5.7: Exercises
6
- Random Processes
- 6.1: Basic Definitions
- 6.2: Marginals
- 6.3: Moments
- 6.4: Random Walk
- 6.5: Processes and Measure
- 6.6: Borell-Cantelli and Zero-One
- 6.7: Notes
- 6.8: Exercises
7
- Important RPs
- 7.1: Markov Chains
- 7.2: Poisson Process
- 7.3: Gaussian Process
- 7.4: Notes
- 7.5: Exercises
8
A
- Set Theory
- A.1: Basic Definition
- A.2: Functions
- A.3: Cardinality
- A.4: Limits of Sets
- A.5: Notes
- A.6: Exercises
B
- Metric Spaces
- B.1: Basic Definitions
- B.2: Limits
- B.3: Continuity
- B.4: Euclidean Space
- B.5: Growth of Functions
- B.6: Notes
- B.7: Exercises
C
- Linear Algebra
- C.1: Basic Definitions
- C.2: Rank
- C.3: Eigenvalues and Determinant
- C.4: Semidefinite Matrices
- C.5: SVD
- C.6: Notes
- C.7: Exercises
D
- Differentiation
- D.1: Scalar Differentiation
- D.2: Power and Taylor Series
- D.3: Notes
- D.4: Exercises
E
- Measure Theory
- E.1: Sigma Algebras
- E.2: Measure Function
- E.3: Extension Theorem
- E.4: Independence
- E.5: Important Measures
- E.6: Measurable Functions
- E.7: Notes
F

Scheffe's Theorem

$ \def\P{\mathsf{\sf P}} \def\E{\mathsf{\sf E}} \def\Var{\mathsf{\sf Var}} \def\Cov{\mathsf{\sf Cov}} \def\std{\mathsf{\sf std}} \def\Cor{\mathsf{\sf Cor}} \def\R{\mathbb{R}} \def\c{\,|\,} \def\bb{\boldsymbol} \def\diag{\mathsf{\sf diag}} \def\defeq{\stackrel{\tiny\text{def}}{=}} \newcommand{\toop}{\xrightarrow{\scriptsize{\text{p}}}} \newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}} \newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}} \newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}} \newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}} \newcommand{\tooas}{\xrightarrow{\scriptsize{\text{as}}}} \newcommand{\tood}{\rightsquigarrow} \newcommand{\iid}{\mbox{$\;\stackrel{\mbox{\tiny iid}}{\sim}\;$}}$

8.4. Scheffe's Theorem

Proposition 8.4.1(Scheffe's Theorem). The following two statements hold.
(a) If $0\leq {\bb X}^{(n)} \tooas \bb X$ and $\E({\bb X}^{(n)})\to \E(\bb X) < \infty$, then \[\E(\|{\bb X}^{(n)}-\bb X\|)\to 0.\] (b) If $f_{{\bb X}^{(n)}}(\bb x)\to f_{\bb X}(\bb x)$ for all $\bb x$, then \begin{align*} \int |f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x) |\,d\bb x & \to 0. \end{align*}

Proof. We start with proving part (a) in the case of one dimension. Note that for any real number $c$ we have $|c|=c+2\max(-c,0)$ (this can be verified independently for a positive and a negative $c$). It follows that \[\E(|X^{(n)}-X|) =\E(X^{(n)}-X) + 2\E(\max(X-X^{(n)},0)).\] The first term converges to 0 since $X^{(n)}\tooas X$. The second term converges to 0 by the dominated convergence theorem for random variables applied to the sequence of RVs $\max(X^{(n)}-X)$ (note that $0\leq \max(X-X^{(n)},0) \leq \max(X,0)$ and $\E(\max(X,0)) < \infty$). The proof for the multivariate case follows by applying the triangle inequality $\|{\bb X}^{(n)}-\bb X\|\leq \sum_{i=1}^d |X^{(n)}_i-X_i|$ and then applying the one dimensional case for each term separately.

The proof of part (b) is similar. We have \begin{align*} |f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x)| &= f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x) +2\max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)\\ \int |f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x)|\, d\bb x &= \int f_{{\bb X}^{(n)}}(\bb x)-f_{{\bb X}}(\bb x)\, d\bb x \\ &\quad + 2\int \max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)\, d\bb x. \end{align*} The first integral in the right hand side above is 0 since both densities integrate to 1. The second integral converges to 0 by the dominated convergence theorem applied to $\max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)$ (note that $\max(f_{{\bb X}}(\bb x)-f_{{\bb X}^{(n)}}(\bb x),0)\leq \max(f_{{\bb X}}(\bb x),0)$) and the Lebesgue measure.

Corollary 8.4.1. If $\lim_{n\to\infty} f_{{\bb X}^{(n)}}(\bb x)= f_{\bb X}(\bb x)$ for all $\bb x$, then \[\lim_{n\to\infty} \sup_A | \P({\bb X}^{(n)}\in A) - \P(\bb X\in A)| = 0\] where the supremum ranges over all measurable sets.

Proof. \begin{align*} \sup_A| \P({\bb X}^{(n)}\in A) - \P(\bb X\in A)| &= \sup_A \Big| \int_A (f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x))\, d\bb x \Big|\\ &\leq \int_A |f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x)| \, d\bb x \\ &=\int |f_{{\bb X}^{(n)}}(\bb x)- f_{\bb X}(\bb x)| \, d\bb x \to 0, \end{align*} where the convergence follows from the Scheffe's Theorem.

The above corollary shows that pointwise convergence of the pdfs is stronger than convergence in distribution: the former implies $\sup_A| \P({\bb X}^{(n)}\in A) - \P(\bb X\in A)| \to 0$, while the latter corresponds to convergence $\P({\bb X}^{(n)}\in A)\to \P(\bb X\in A)$ only for some sets of the form $A=(-\infty,a]$ (for some values $a$).