Probability

The Analysis of Data, volume 1

Random Vectors: Conditional Probabilities and Random Vectors

4.5. Conditional Probabilities and Random Vectors

Conditional probabilities for random vectors are defined similarly to the scalar case. Considering a joint distribution over the random vector $\bb{Z}=(\bb{X},\bb{Y})$, the conditional probability $\P(\bb X\in A \c \bb Y=\bb y)$ reflects an updated likelihood for the event $\bb X\in A$ given that $\bb Y=\bb y$.

The conditional cdf, pdf, and pmf are defined as follows \begin{align} F_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) &= \begin{cases} \P(\bb{X} \leq \bb{x}, \bb{Y}=\bb{y}) / p_{\bb{Y}}(\bb{y}) & \bb{Y} \text{ is discrete} \\ \P(\bb{X} \leq \bb{x}, \bb{Y}=\bb{y}) / f_{\bb{Y}}(\bb{y}) & \bb{Y} \text{ is continuous} \end{cases} \\ f_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) &= \frac{ \partial^n}{\partial x_1\cdots\partial x_n} F_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) \\ p_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) &= \frac{\P(\bb{X} = \bb{x}, \bb{Y}=\bb{y})}{\P(\bb{Y}=\bb{y})} = \frac{p_{\bb{X},\bb{Y}}((\bb{x},\bb{y}))}{p_{\bb{Y}}(\bb{y})}. \end{align} Note that we assume above that $f_{\bb Y}(\bb y)$ and $p_{\bb Y}(\bb y)$ are not zero.

When both $\bb X$ and $\bb Y$ are continuous their joint cdf is differentiable and \begin{align*} f_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) &= \frac{f_{\bb{X},\bb{Y}}(\bb x,\bb{y})}{f_{\bb Y}(\bb y)}. \end{align*}

Computing conditional probabilities from the conditional pdf and pmf proceeds as in the non-conditional case, by integrating over the corresponding pdf or summing over the corresponding pmf. The proof is similar to the scalar case (see Chapter 2).

\begin{align*} \P(\bb{X}\in A \c \bb{Y}=\bb{y})= \begin{cases} \int_A f_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x})d\bb{x} & \bb{X} \text{ is continuous}\\ \sum_{\bb{x}\in A} p_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) & \bb{X} \text{ is discrete} \end{cases}. \end{align*}

Example 4.5.1. \begin{align*} F_{X_2 \c X_1=x_1,X_3=x_3}(x_2) = \begin{cases}\frac{\P( X_1=x_1, X_2\leq x_2, X_3=x_3)}{\sum_{x_2} p_{\bb{X}}(X_1=x_1, X_2=x_2, X_3=x_3)}& \bb{X} \text{ is discrete}\\ \frac{\P( X_1=x_1, X_2\leq x_2, X_3=x_3)}{\int f_{\bb{X}}(X_1=x_1, X_2=x_2,X_3=x_3)\,dx_2} & \bb{X} \text{ is continuous} \end{cases} \end{align*}
Example 4.5.2. \begin{align*} f_{X_i \c \{X_j=x_j:j\neq i\}}(x_i) &= \frac{\frac{d}{dx_i} \int_{-\infty}^{x_i} f_{X_1,\ldots,X_n}(x_1,\ldots,x_n)dx_i}{ f_{X_1,\ldots,X_{i-1},X_{i+1},\ldots,X_n}(x_1,\ldots,x_{i-1},x_{i+1},\ldots,x_n)}\\ &=\frac{f_{X_1,\ldots,X_n}(x_1,\ldots,x_n)}{\int_{-\infty}^{\infty} f_{X_1,\ldots,X_n}(x_1,\ldots,x_n)dx_i}. \end{align*} In the case of $n=3$, we have \begin{align*}f_{X_2 \c X_1=x_1,X_3=x_3}(x_2) &= \frac{d}{d x_2} F_{X_2 \c X_1=x_1,X_3=x_3}(x_2)\\ &=\frac{f_{X_1,X_2,X_3}(x_1,x_2,x_3)}{ \int_{-\infty}^{\infty}f_{X_1,X_2,X_3}(x_1,x_2,x_3) dx_2}.\end{align*}

The above formulas lead to the following generalization of the Bayes rule for events $\P(A \c B)=\P(B \c A)\P(A) / \P(B)$ (Proposition 1.5.2).

Proposition 4.5.1 (Bayes Rule). \begin{align*} f_{\bb X}(\bb X) &= f_{X_i \c \{X_j=x_j:j\neq i\}}(x_i) f_{X_1,\ldots,X_{i-1},X_{i+1},\ldots,X_n}(x_1,\ldots,x_{i-1}, x_{i+1}, \ldots, x_n)\\ p_{\bb X}(\bb X) &= p_{X_i \c \{X_j=x_j:j\neq i\}}(x_i) p_{X_1,\ldots,X_{i-1},X_{i+1},\ldots,X_n}(x_1,\ldots,x_{i-1}, x_{i+1}, \ldots, x_n) \end{align*}
Proof. The pdf formula follows from Example 4.5.2. The derivation of the pmf formula is similar.
Corollary 4.5.1. \begin{align*} f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) &= f_{X_1}(x_1) f_{X_2 \c X_1=x_1}(x_2)f_{X_3 \c X_1=x_1,X_2=x_2}(x_3) \cdots f_{X_n \c X_1=x_1,\ldots,X_{n-1}=x_{n-1}}(x_n).\\ p_{X_1,\ldots,X_n}(x_1,\ldots,x_n) &= p_{X_1}(x_1) p_{X_2 \c X_1=x_1}(x_2)p_{X_3 \c X_1=x_1,X_2=x_2}(x_3) \cdots p_{X_n \c X_1=x_1,\ldots,X_{n-1}=x_{n-1}}(x_n). \end{align*}
Proof. The proof follows repeated use of Proposition 4.5.1.

The ordering of $X_1,\ldots,X_n$ in the decomposition above is arbitrary and similar formulas hold when the variables are relabeled. For example replace $X_1$ with $X_2$, $X_2$ with $X_3$, and $X_3$ with $X_1$, or any other arbitrary relabeling (Formally, given a permutation function $\pi:\{1,\ldots,n\}\to\{1,\ldots,n\}$, which is a one-to-one and onto function, a relabeling of the vector $(X_1,\ldots,X_n)$ is the vector $(X_{\pi(1)},\ldots,X_{\pi(n)})$.). For example the following two equations hold. \begin{align*} f_{X_1,X_2,X_3}(\bb x) &= f_{X_1} (x_1) f_{X_2 \c X_1=x_1}(x_2) f_{X_3 \c X_1=x_1,X_2=x_2}(x_3)\\ f_{X_1,X_2,X_3}(\bb x) &= f_{X_2} (x_2) f_{X_3 \c X_2=x_2}(x_3) f_{X_1 \c X_2=x_2,X_3=x_3}(x_1). \end{align*}

Example 4.5.2. Suppose that a point $X$ is chosen from a uniform distribution in the interval $[0,1]$ and that after $X=x$ is observed a point $Y$ is drawn from a uniform distribution on the interval $[x,1]$. We have \begin{align*} f_{X,Y}(x,y) &= f_X(x)f_{Y \c X=x}(x)= \begin{cases} 1\cdot \frac{1}{1-x} & 0 < x < y < 1 \\ 0 & \text{otherwise} \end{cases}\\ f_Y(y)&= \begin{cases} \int_{-\infty}^{\infty}f_{X,Y}(x,y) dx=\int_0^y\frac{1}{1-x}dx=-\log(1-y) & 0 < y < 1 \\ 0 & \text{otherwise} \end{cases}\\ f_{X \c Y=y}(x)&=f_{X,Y}(x,y)/f_{Y}(y)= \begin{cases} \frac{-1}{(1-x)\log(1-y)} & 0 < x < y < 1\\ 0 &\text{otherwise} \end{cases}. \end{align*}