The Analysis of Data, volume 1

Random Variables: RVs and Measure Theory

2.5. Random Variables and Measurable Functions

As described in Section 1.7, a rigorous definition of $\P$ requires it to be defined on a $\sigma$-algebra $\mathcal{F}$ of events in $\Omega$. Similarly, a rigorous definition of a random variable is a measurable function from the measurable space $(\Omega,\mathcal{F})$ to the measurable space $(\R,\mathcal{B}(\R))$ (see Chapter E). This definition ensures that we can compute $\P(X\in A)$ for all Borel sets $A\subset\mathcal{B}(\R)$.

Random variables may be discrete, continuous, or neither. The description in this chapter is simplified in that we consider only discrete or continuous random variables. This simplification enables us to develop the theory of random variables almost without reference to measure theory.

As mentioned in the chapter, a random variable $X:\Omega\to\R$ defines a new probability function $\P'$ on $\R$ such that $\P'(A)=\P(X\in A)$. The more precise statement is that $X$ defines a measure space $(\R,\mathcal{B}(\R),\P')$ where $\P'=\P X^{-1}$ is the transformed measure (see Section F.3).

We have the following correspondence between RV notations and Lebesgue integrals.

RV Notation Integral Notation
$\P(A)$ $\int_A \,d\P$
$\E(X)$ $\int x \, \P(dx)$
$\P(X\in A)$ $\int_A \,d\P X^{-1}$
$\E(g(X))$ $\int g \, d\P$

For example, the Lebesgue integral $\int_A\,\P(dx)$ becomes $\int_A f_X(x)\,dx$ if $X$ is a continuous RV and $\sum_{x\in A} p_X(x)$ if $X$ is discrete. Thus, we can leverage a single notation to address discrete RVs, continuous RVs, and RVs that are neither discrete nor continuous, a significant convenience. Doing so, however, requires the significantly more complex mathematics of measure theory and Lebesgue integration.