Probability and Measure Theory

Probability

The Analysis of Data, volume 1

0
- Front Matter
- 0.1: Contents
- 0.2: Preface
1
2
- Random Variables
- 2.1: Basic Definitions
- 2.2: Functions of RVs
- 2.3: Expectation and Variance
- 2.4: Moments and MGF
- 2.5: RVs and Measure Theory
- 2.6: Notes
- 2.7: Exercises
3
4
5
- Important Vectors
- 5.1: Multinomial Vectors
- 5.2: Gaussian Vectors
- 5.3: Dirichlet Vectors
- 5.4: Mixture Vectors
- 5.5: Exponential Family
- 5.6: Notes
- 5.7: Exercises
6
- Random Processes
- 6.1: Basic Definitions
- 6.2: Marginals
- 6.3: Moments
- 6.4: Random Walk
- 6.5: Processes and Measure
- 6.6: Borell-Cantelli and Zero-One
- 6.7: Notes
- 6.8: Exercises
7
- Important RPs
- 7.1: Markov Chains
- 7.2: Poisson Process
- 7.3: Gaussian Process
- 7.4: Notes
- 7.5: Exercises
8
A
- Set Theory
- A.1: Basic Definition
- A.2: Functions
- A.3: Cardinality
- A.4: Limits of Sets
- A.5: Notes
- A.6: Exercises
B
- Metric Spaces
- B.1: Basic Definitions
- B.2: Limits
- B.3: Continuity
- B.4: Euclidean Space
- B.5: Growth of Functions
- B.6: Notes
- B.7: Exercises
C
- Linear Algebra
- C.1: Basic Definitions
- C.2: Rank
- C.3: Eigenvalues and Determinant
- C.4: Semidefinite Matrices
- C.5: SVD
- C.6: Notes
- C.7: Exercises
D
- Differentiation
- D.1: Scalar Differentiation
- D.2: Power and Taylor Series
- D.3: Notes
- D.4: Exercises
E
- Measure Theory
- E.1: Sigma Algebras
- E.2: Measure Function
- E.3: Extension Theorem
- E.4: Independence
- E.5: Important Measures
- E.6: Measurable Functions
- E.7: Notes
F

$ \def\P{\mathsf{P}} \def\R{\mathbb{R}} $

1.7. Probability and Measure Theory*

Definition 1.2.1 appears to be formal, and yet is not completely rigorous. It states that a probability function $\P$ assigns real values to events $E\subset \Omega$ in a manner consistent with the three axioms. The problem is that the domain of the probability function $\P$ is not clearly specified. In other words, if $\P$ is a function $\P:\mathcal{F}\to\R$ from a set $\mathcal{F}$ of subsets of $\Omega$ to $\R$, the set $\mathcal{F}$ is not specified. The importance of this issue stems from the fact that the three axioms need to hold for all sets in $\mathcal{F}$.

At first glance this appears to be a minor issue that can be solved by choosing $\mathcal{F}$ to be the power set of $\Omega$: $2^{\Omega}$. This works nicely whenever $\Omega$ is finite or countably infinite. But selecting $\mathcal{F}=2^{\Omega}$ does not work well for uncountably infinite $\Omega$ such as continuous spaces. It is hard to come up with useful functions $\P:2^{\Omega}\to\R$ that satisfy the three axioms for all subsets of $\Omega$.

A satisfactory solution that works for uncountably infinite $\Omega$ is to define $\mathcal{F}$ to be a $\sigma$-algebra of subsets of $\Omega$ that is smaller than $2^{\Omega}$. In particular, when $\Omega\subset\R^d$, the Borel $\sigma$-algebra is sufficiently large to include the "interesting" subsets of $\Omega$ and yet is small enough to not restrict $\P$ too much (see Section E.1 for a definition of a $\sigma$-algebra and the Borel $\sigma$-algebra).

We also note that a probability function $\P$ is nothing but a measure $\mu$ on a measurable space $(\Omega,\mathcal{F})$ satisfying $\mu(\Omega)=1$. In other words, the triplet $(\Omega,\mathcal{F},\P)$ is a measure space where $\mathcal{F}$ is the $\sigma$-algebra of measurable sets and $\P$ is a measure satisfying $\P(\Omega)=1$. Thus, the wide array of mathematical results from measure theory (Chapter E) and Lebesgue integration (Chapter F) are directly applicable to probability theory.