Probability

The Analysis of Data, volume 1

Caratheodory's Extension Theorem

E.3. Caratheodory's Extension Theorem*

In many cases it is hard to define a probability function on all sets $A$ in a $\sigma$-algebra. Caratheodory's extension theorem shows that it is sufficient to define the probability measure on an algebra $\mathcal{C}$. The probability measure is then uniquely defined on $\mathcal{\sigma}(\mathcal{C})$, in a way consistent with its definition on $\mathcal{C}$. The proof of this important result is somewhat lengthy. The existence of the extension is based on Dynkin's theorem, an important result from set theory. The uniqueness of the extension is based on the concept of an outer measure.

E.3.1 Dynkin's Theorem*

Definition E.3.1. A set of sets $\mathcal{C}$ is a $\pi$-system if the following holds: \[A,B \in\mathcal{C}\quad\text{implies} \quad A\cap B\in\mathcal{C}.\]

It follows by induction that for a $\pi$-system $\mathcal{C}$, if $A_1,\ldots,A_k\in\mathcal{C}$, we have $A_1\cap \cdots \cap A_k\in\mathcal{C}$.

Definition E.3.2. A set of sets $\mathcal{C}$ is a $\lambda$-system if it satisfies: (i) $\Omega\in\mathcal{C}$, (ii) $A\in\mathcal{C}$ implies $A^c\in\mathcal{C}$, and (iii) $A_n\in\mathcal{C}, n\in\mathbb{N}$ and $A_n\cap A_m=\emptyset$ for $n\neq m$ implies $\cup_n A_n\in\mathcal{C}$.

Note that for a $\lambda$-system $\mathcal{C}$, we have $\emptyset=\Omega^c\in\mathcal{C}$, which together with property (iii) implies also that if $A_n\in\mathcal{C}, n=1,\ldots,k$ and $A_n\cap A_m=\emptyset$ for $n\neq m$ then $\cup_{n=1}^k A_n\in\mathcal{C}$ (take $A_n=\emptyset$ for all $n>k$).

Proposition E.3.1. Assume that $\mathcal{C}$ satisfies properties (i) and (iii) above. Then property (ii) holds if and only if the following property holds \begin{align*} \text(ii'): \quad A,B\in\mathcal{C}, \quad A\subset B \qquad \text{implies} \qquad B\setminus A\in\mathcal{C}. \end{align*}
Proof. We assume that $\mathcal{C}$ satisfies properties (i), (ii), and (iii). If $A,B\in\mathcal{C}$ and $A\subset B$, then $B^c\in \mathcal{C}$, $A\cup B^c \in \mathcal{C}$, and consequentially $(A\cup B^c)^c=A^c\cap B=B\setminus A \in \mathcal{C}$.

Conversely, we assume that $\mathcal{C}$ satisfies properties (i), and (iii), and that property (ii') holds. Then $A\in\mathcal{C}$ implies $A^c=\Omega\setminus A\in\mathcal{C}$ establishing property (ii).

Proposition E.3.2. A set of sets $\mathcal{C}$ that is both a $\pi$-system and a $\lambda$-system is a $\sigma$-algebra.
Proof. The property $\Omega\in\mathcal{C}$ holds since $\mathcal{C}$ is a $\lambda$-system. The set $\mathcal{C}$ is closed under finite union (as a $\pi$-system) and intersections (using the fact that it is closed under finite union and under complements and De-Morgan's theorem). If $A_n\in\mathcal{C}, n\in\mathbb{N}$ then the sets $B_n=A_n\cap A_1^c\cap A_2^c\cdots A_{n-1}^c$, $n\in\mathbb{N}$ are disjoint, and by property (iii) of a $\lambda$-system their union is also in $\mathcal{C}$. Noting that $\cup_{n\in\mathbb{N}} A_n=\cup_{n\in\mathbb{N}} B_n$ establishes that $\mathcal{C}$ is closed under countable unions.
Proposition E.3.3. The intersection of several $\lambda$-systems is a lambda system. Thus, given several $\lambda$ systems we have a unique minimal $\lambda$ system containing them.
Proof. The proof is very similar to the proof of Proposition E.1.2.
Proposition E.3.4 (Dynkin's Theorem). If $\mathcal{C}$ is a $\pi$-system and $\mathcal{D}$ is a $\lambda$-system. Then $\mathcal{C}\subset\mathcal{D}$ implies $\sigma(\mathcal{C})\subset\mathcal{D}$.
Proof. We denote by $\mathcal{D}'$ the smallest $\lambda$-system containing $\mathcal{C}$ (defined as the intersection of all $\lambda$-systems containing $\mathcal{C}$, much like the definition of the $\sigma$-algebra generated by a set of sets). We thus have $\mathcal{C}\subset\mathcal{D}'\subset\mathcal{D}$. If we show that $\mathcal{D}'$ is also a $\pi$-system, Proposition E.3.2 implies that it is a $\sigma$-algebra, which in turn implies $\mathcal{C}\subset \sigma(C)\subset \mathcal{D}'\subset\mathcal{D}$ and concludes the proof.

We denote by $\mathcal{D}_A$ to be the set of sets $B$ for which $A\cap B\in\mathcal{D}'$ and show that if $A\in\mathcal{D}'$ then $\mathcal{D}_A$ is a $\lambda$-system. Property (i) holds since $A\cap\Omega=\Omega\in\mathcal{D}'$ (since $\mathcal{D}'$ is a $\lambda$-system). If $B,B'\in\mathcal{D}_A$ with $B\subset B'$ then $A\cap B, A\cap B'\in\mathcal{D}'$ and using property (ii') $A\cap B'\setminus A\cap B=A\cap(B'\setminus B) \in\mathcal{D}'$. This shows that $B'\setminus B \in \mathcal{D}_A$ and that property (ii') in Proposition E.3.1 holds for $\mathcal{D}_A$. If the disjoint sets $B_n, n\in\mathbb{N}$ are in $\mathcal{D}_A$ then $A\cap B_n\in\mathcal{D}'$. Using property (iii) for the $\lambda$ system $\mathcal{D}'$, we have $A\cap (\cup_n B_n)=\cup_n (A\cap B_n)\in\mathcal{D}'$ implying that $\cup_n B_n\in\mathcal{D}_A$, showing that $\mathcal{D}_A$ is a $\lambda$-system.

If $A,B\in\mathcal{C}$ then $A\cap B\in\mathcal{C}\subset \mathcal{D}'$ (since $\mathcal{C}$ is a $\pi$-system) and so $B\in\mathcal{D}_A$. This implies that if $A\in\mathcal{C}$ then $\mathcal{C}\subset\mathcal{D}_A$. We thus have the following sequence of implications \begin{align*} A\in\mathcal{C} \quad &\Rightarrow \quad \mathcal{C} \subset \mathcal{D}' \subset \mathcal{D}_A\quad (\mathcal{D}'\text{ is the minimal } \lambda \text{ system})\\ A\in\mathcal{C}, B\in\mathcal{D}'\quad &\Rightarrow \quad B\in\mathcal{D}_A \quad\Rightarrow\quad A\in\mathcal{D}_B\\ B\in\mathcal{D}' \quad &\Rightarrow\quad \mathcal{C}\subset \mathcal{D}_B\\ B\in\mathcal{D}' \quad &\Rightarrow\quad \mathcal{D}'\subset\mathcal{D}_B \quad (\mathcal{D}' \text{ is the minimal } \lambda \text{-system containing } \mathcal{C}). \end{align*} Using the implications above, we conclude the proof by showing that $\mathcal{D}'$ is a $\pi$-system: \[ A,B\in\mathcal{D}' \quad\Rightarrow\quad A\in\mathcal{D}_B \quad\Rightarrow \quad A\cap B \in\mathcal{D}'.\]

The following corollary establishes the uniqueness part of Caratheodory's extension theorem.

Corollary E.3.1. Let $\P_1,\P_2$ be two probability measures on $\sigma(\mathcal{C})$, where $\mathcal{C}$ is a $\pi$-system. If $\P_1,\P_2$ agree on $\mathcal{C}$ then they also agree on $\sigma(\mathcal{C})$.
Proof. We denote the set $\mathcal{A}$ on which $\P_1$ and $\P_2$ agree and note that $\mathcal{C}\subset A\subset\sigma(\mathcal{C})$. Since $\P_1,\P_2$ are probability measures their agreement on $A$ implies their agreement on $A^c$, thus $A\in\mathcal{A}$ implies $A^c\in\mathcal{A}$. If $A_n, n\in\mathbb{N}$ is a sequence of disjoint sets in $\mathcal{A}$ then $\cup_n A_n\in\mathcal{A}$ (by countable additivity of $\P_1$ and $\P_2$). It follows that $\mathcal{A}$ is a $\lambda$-system. Dynkin's theorem (Proposition E.3.4) then states that $\sigma(\mathcal{C})\subset \mathcal{A}$.

E.3.2 Outer Measure*

Definition E.3.3. Let $\P$ be a probability measure on an algebra $\mathcal{C}$. For each set $A\subset\Omega$ we define its outer measure to be \[ \P^*(A)=\inf \sum_n \P(A_n)\] where the infimum ranges over all finite and infinite sequences of sets $A_n$ in $\mathcal{C}$ that cover $A$ ($A\subset \cup_n A_n$).
Definition E.3.4. Given a probability measure $\P$ on an algebra $\mathcal{C}$ we define a set $A$ to be $\P^*$-measurable if it satisfies the following equation \begin{align*} \P^*(A\cup E) + \P^*(A^c\cup E) = \P^*(E),\qquad \forall E\subset\Omega. \end{align*} We denote the set of all $\P^*$-measurable sets by $\mathcal{M}$.
Proposition E.3.5. The outer probability measure function $\P^*$ satisfies the following properties.
  • Empty Set: $\P^*(\emptyset)=0$.
  • Non-Negativity: $\P^*(A)\geq 0$ for all $A\subset\Omega$.
  • Monotonicity: $A\subset B$ implies $\P^*(A)\leq \P^*(B)$.
  • Countable Subadditivity: $\P^*(\cup_{n\in\mathbb{N}} A_n) \leq \sum_{n\in\mathbb{N}} \P^*(A_n)$.
  • Proof. The first property follows from the fact that $\emptyset$ is covered by itself, and has probability measure $\P(\emptyset)=1-\P(\Omega)=0$. The second property holds since the function $\P$ is non-negative. The third property holds since a covering of $A$ is also a covering of $B$.

    We prove the fourth property below. For any $\epsilon$ and any $n$ we construct a sequence of sets $B_{nk}, k\in\mathbb{N}$ such that $A_n\subset \cup_k B_{nk}$ and $\sum_k \P(B_{nk})\leq \P^*(A_n)+\epsilon 2^{-n})$. Such a construction is possible since $\P^*$ is defined as the infimum over all possible coverings. The fourth property holds since $\cup_n A_n\subset \cup_n\cup_k B_{nk}$ and \begin{align*} \P^*\left(\bigcup_n A_n\right) &= \P^*\left(\bigcup_n\bigcup_k B_{nk} \right) \leq \sum_{n}\sum_k \P(B_{nk}) < \sum_n \P^*(A_n)+\epsilon 2^{-n} \\ &= \sum_n \P^*(A_n)+\epsilon. \end{align*}

    Corollary E.3.2. A set $A$ is in $\mathcal{M}$ if and only if for all $E\subset\Omega$ \begin{align} \tag{*} \P^*(A\cap E) + \P^*(A^c\cap E) \leq \P^*(E). \end{align}
    Proof. By the countable sub-additivity property of the proposition above, $\P^*(A\cap E) + \P^*(A^c\cap E) \geq \P^*(E)$ always hold. Thus equality for a specific $A$ holds for all $E$ if and only if (*) holds for the same $A$ and all $E$.
    Proposition E.3.6. The set $\mathcal{M}$ is an algebra.
    Proof. We have $\Omega\in\mathcal{M}$ since $\P^*(\Omega\cap E)+\P^*(\Omega^c\cap E)=\P^*(E)+0$. If $A\in\mathcal{M}$, we also have $A^c\in\mathcal{M}$. If $A,B\in\mathcal{M}$, then for all $E\subset\Omega$ \begin{align*} \P^*(E) &= \P^*(B\cap E)+\P^*(B^c\cap E) \\ &=\P^*(A\cap B\cap E)+\P^*(A^c\cap B\cap E)+\P^*(A\cap B^c\cap E)+\P^*(A^c\cap B^c\cap E)\\ &\geq \P^*(A\cap B\cap E) + \P^*((A^c\cap B\cap E)\cup (A\cap B^c\cap E) \cup (A^c\cap B^c\cap E))\\ &= \P^*((A\cap B)\cap E) + \P^*((A\cap B)^c \cap E), \end{align*} implying that $A\cap B\in\mathcal{M}$. The inequality above follows from the fourth property of Proposition E.3.5.
    Proposition E.3.7. For a finite or infinite sequence $A_n$ of disjoint sets in $\mathcal{M}$, \begin{align} \P^*\left(E\cap \left(\bigcup_n A_n\right) \right) = \sum_n \P^*(E\cap A_n) \quad \forall E\subset\Omega. \tag{**} \end{align}
    Proof. We start with the case of a finite sequence of sets $A_n$ in $\mathcal{M}$. We prove by induction on the number of sets $n$. The case $n=1$ hold trivially. We consider the case $n=2$. If $A_1\cup A_2=\Omega$, (**) is a restatement of the the condition in Definition E.3.4. Otherwise, using the fact that $A_1\in\mathcal{M}$ and the condition in Definition E.3.4 we have \[ \P^*(A_1\cap E\cap(A_1\cup A_2))+\P^*(A_1^c\cap E\cap(A_1\cup A_2)) = \P^*(E\cap(A_1\cup A_2)).\] Since $A_1, A_2$ are disjoint, the left hand side above simplifies to the right side of (**). For a general $n$, we prove the induction argument using the $n=2$ and $n-1$ induction hypotheses \[ \P^*\left(E\cap \left(\bigcup_{k=1}^n A_k\right)\right) = \P^*\left(E \cap \left(\bigcup_{k=1}^{n-1} A_k\right)\right) + \P^*(E\cap A_k)=\sum_{k=1}^n \P^*(E\cap E_k).\]

    For the case of an infinite sequence of disjoint sets $A_n, n\in\mathbb{N}$, we observe that \[ \P^*\left(E\cap \left(\bigcup_{k\in\mathbb{N}} A_k \right)\right) \geq \P^*\left(E\cap \left(\bigcup_{k=1}^n A_k \right)\right) =\sum_{k=1}^n \P^*(E\cap E_k).\] Letting $n\to\infty$ in the equation above we conclude that \[ \P^*\left(E\cap \left(\bigcup_{k\in\mathbb{N}} A_k \right)\right) \geq \sum_{k\in\mathbb{N}} \P^*(E\cap E_k).\] The reverse inequality holds by property 4 of Proposition E.3.5.

    Proposition E.3.8. The set $\mathcal{M}$ is a $\sigma$-algebra, and $\P^*$ restricted to $\mathcal{M}$ is countable additive.
    Proof. Let $A_n, n\in\mathbb{N}$ be a sequence of disjoint sets in $\mathcal{M}$ with union $A$. Since $\mathcal{M}$ is an algebra, $F_n=\cup_{k=1}^n A_k\in\mathcal{M}$ and \begin{align*} \P^*(E) &= \P^*(E\cap F_n)+\P^*(E\cap F_n^c)\\ &= \sum_{k=1}^n \P^*(E\cap A_k) +\P^*(E\cap F_n^c) \\ &\geq \sum_{k=1}^n \P^*(E\cap A_k) +\P^*(E\cap A^c). \end{align*} Above, we used (**) to derive the second equality and the fact that $A^c\subset F_n^c$ to derive the inequality. Letting $n\to\infty$ in the equation above, and using (**) again, we have \begin{align*} \P^*(E) &\geq \sum_{k\in\mathbb{N}} \P^*(E\cap A_k) +\P^*(E\cap A_k^c)\\ &= \P^*(E\cap A) + \P^*(E \cap A^c). \end{align*} Together with (*), the equation above proves that $\mathcal{M}$ is closed under a countable union of disjoint sets.

    To show that $\mathcal{M}$ is a $\sigma$-algebra, it remains to show that it is closed under countable unions of arbitrary sets $B_n\in\mathcal{M}, n\in\mathbb{N}$ (not necessarily disjoint). Denoting $A_1=B_1$, $A_k=B_k\cap A_1^c\cap \cdots \cap A_{k-1}^c$, we have that $A_n$ is a sequence of disjoint sets in $\mathcal{M}$, and since $A=\cup_n A_n=\cup_n B_n=B$, we have \begin{align*} \P^*(E) &\geq \P^*(E\cap A) + \P^*(E \cap A^c)= \P^*(E\cap B) + \P^*(E \cap B^c). \end{align*} Together with (*), the equation above proves that $\mathcal{M}$ is closed under a countable union of arbitrary sets.

    The countable additivity of $\P^*$ on $\mathcal{M}$ follows from (**) with $\Omega$ substituting $E$.
    Proposition E.3.9. Let $\P$ be a probability measure on an algebra $\mathcal{C}$, $\P^*$ the associated outer measure, and $\mathcal{M}$ the set of all $\P^*$-measurable sets. Then (i) for all $A\in\mathcal{C}$, $\P^*(A)=\P(A)$, and (ii) $\mathcal{C} \subset \mathcal{M}$.
    Proof. By definition $\P^*(A)=\inf \sum_n \P(A_n)$ where the infimum ranges over all covering of $A$ within $\mathcal{C}$. If $A\in\mathcal{C}$, the covering of $A$ achieving the infimum is precisely $A_1=A$ implying that $\P^*=\P$ on $\mathcal{C}$ (for any covering $A_n$ of $A$, $\P(A)\leq \P(\cup_n A_n) \leq \sum_n \P(A_n)$). We now prove that $\mathcal{C} \subset \mathcal{M}$. Let $A\in\mathcal{C}$. For each set $E$ and $\epsilon>0$ select a sequence of sets $A_n, n\in\mathbb{N}$ in $\mathcal{C}$ such that \begin{align} \sum_n \P(A_n) \leq \P^*(E)+\epsilon. \end{align} This is possible since $\P^*(E)$ is defined as the infimum of such sums over all coverings of $E$. Since $\mathcal{C}$ is an algebra, $B_n=A_n\cap A\in\mathcal{C}$ and $C_n=A_n\cap A^c\in\mathcal{C}$. The following inequality \begin{align*} \P^*(E\cap A)+\P^*(E\cap A^c) &\leq \P^*\left(\bigcup_n B_n\right) + \P^*\left(\bigcup_n C_n\right) \leq \sum_n \P^*(B_n) + \P^*(C_n) \\ & = \sum_n \P(B_n) + \sum_n \P(C_n) = \sum_n \P(A_n)\leq \P^*(E)+\epsilon. \nonumber \end{align*} holds for all $\epsilon\geq 0$ indicating that $A\in\mathcal{M}$. Above, the first inequality in follows from $E\cap A\subset \cup_n B_n$ and $E\cap A^c \subset \cup_n C_n$ and monotonicity of $\P^*$. The second inequality follows from the countable subadditivity of $\P^*$. The first equality follows from $B_n, C_n\in\mathcal{C}$ and the first part of this proposition. The second equality follows from the equation $\sum_n \P(A_n) \leq \P^*(E)+\epsilon$ that was established above.

    E.3.3 The Extension Theorem*

    Proposition E.3.10 (Caratheodory's extension theorem). A probability measure $\P$ defined on an algebra $\mathcal{C}$ has a unique extension to a probability measure on $\sigma(\mathcal{C})$.
    Proof. We first prove the existence of the extension. Since $\mathcal{M}$ is a $\sigma$-algebra (Proposition E.3.8) and $\sigma(\mathcal{C})$ is the smallest $\sigma$-algebra, we have \[\mathcal{C} \subset \sigma(\mathcal{C}) \subset \mathcal{M}.\] Furthermore, Proposition E.3.8 states that $\P^*$ is countable additive on $\mathcal{M}$ implying that its restriction to $\mathcal{M}$ is a probability measure. It follows that $\P^*$ is also a probability measure on $\sigma(\mathcal{C})$ that agrees with $\P$ on $\mathcal{C}$ (Proposition E.3.9).

    The uniqueness of the extension follows from Proposition E.3.1 and the fact that the algebra $\mathcal{C}$ is also a $\pi$-system.

    Caratheodory's extension theorem holds also for measures that are not probability measures, including measures for which $\mu(\Omega)=+\infty$. A proof is available in (Billingsley, 1995).