Random Vectors: Conditional Probabilities and Random Vectors

Probability

The Analysis of Data, volume 1

0
- Front Matter
- 0.1: Contents
- 0.2: Preface
1
2
- Random Variables
- 2.1: Basic Definitions
- 2.2: Functions of RVs
- 2.3: Expectation and Variance
- 2.4: Moments and MGF
- 2.5: RVs and Measure Theory
- 2.6: Notes
- 2.7: Exercises
3
4
5
- Important Vectors
- 5.1: Multinomial Vectors
- 5.2: Gaussian Vectors
- 5.3: Dirichlet Vectors
- 5.4: Mixture Vectors
- 5.5: Exponential Family
- 5.6: Notes
- 5.7: Exercises
6
- Random Processes
- 6.1: Basic Definitions
- 6.2: Marginals
- 6.3: Moments
- 6.4: Random Walk
- 6.5: Processes and Measure
- 6.6: Borell-Cantelli and Zero-One
- 6.7: Notes
- 6.8: Exercises
7
- Important RPs
- 7.1: Markov Chains
- 7.2: Poisson Process
- 7.3: Gaussian Process
- 7.4: Notes
- 7.5: Exercises
8
A
- Set Theory
- A.1: Basic Definition
- A.2: Functions
- A.3: Cardinality
- A.4: Limits of Sets
- A.5: Notes
- A.6: Exercises
B
- Metric Spaces
- B.1: Basic Definitions
- B.2: Limits
- B.3: Continuity
- B.4: Euclidean Space
- B.5: Growth of Functions
- B.6: Notes
- B.7: Exercises
C
- Linear Algebra
- C.1: Basic Definitions
- C.2: Rank
- C.3: Eigenvalues and Determinant
- C.4: Semidefinite Matrices
- C.5: SVD
- C.6: Notes
- C.7: Exercises
D
- Differentiation
- D.1: Scalar Differentiation
- D.2: Power and Taylor Series
- D.3: Notes
- D.4: Exercises
E
- Measure Theory
- E.1: Sigma Algebras
- E.2: Measure Function
- E.3: Extension Theorem
- E.4: Independence
- E.5: Important Measures
- E.6: Measurable Functions
- E.7: Notes
F

$ \def\P{\mathsf{\sf P}} \def\E{\mathsf{\sf E}} \def\Var{\mathsf{\sf Var}} \def\Cov{\mathsf{\sf Cov}} \def\std{\mathsf{\sf std}} \def\Cor{\mathsf{\sf Cor}} \def\R{\mathbb{R}} \def\c{\,|\,} \def\bb{\boldsymbol} \def\diag{\mathsf{\sf diag}} \def\defeq{\stackrel{\tiny\text{def}}{=}} $

4.5. Conditional Probabilities and Random Vectors

Conditional probabilities for random vectors are defined similarly to the scalar case. Considering a joint distribution over the random vector $\bb{Z}=(\bb{X},\bb{Y})$, the conditional probability $\P(\bb X\in A \c \bb Y=\bb y)$ reflects an updated likelihood for the event $\bb X\in A$ given that $\bb Y=\bb y$.

The conditional cdf, pdf, and pmf are defined as follows \begin{align} F_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) &= \begin{cases} \P(\bb{X} \leq \bb{x}, \bb{Y}=\bb{y}) / p_{\bb{Y}}(\bb{y}) & \bb{Y} \text{ is discrete} \\ \P(\bb{X} \leq \bb{x}, \bb{Y}=\bb{y}) / f_{\bb{Y}}(\bb{y}) & \bb{Y} \text{ is continuous} \end{cases} \\ f_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) &= \frac{ \partial^n}{\partial x_1\cdots\partial x_n} F_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) \\ p_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) &= \frac{\P(\bb{X} = \bb{x}, \bb{Y}=\bb{y})}{\P(\bb{Y}=\bb{y})} = \frac{p_{\bb{X},\bb{Y}}((\bb{x},\bb{y}))}{p_{\bb{Y}}(\bb{y})}. \end{align} Note that we assume above that $f_{\bb Y}(\bb y)$ and $p_{\bb Y}(\bb y)$ are not zero.

When both $\bb X$ and $\bb Y$ are continuous their joint cdf is differentiable and \begin{align*} f_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) &= \frac{f_{\bb{X},\bb{Y}}(\bb x,\bb{y})}{f_{\bb Y}(\bb y)}. \end{align*}

Computing conditional probabilities from the conditional pdf and pmf proceeds as in the non-conditional case, by integrating over the corresponding pdf or summing over the corresponding pmf. The proof is similar to the scalar case (see Chapter 2).

\begin{align*} \P(\bb{X}\in A \c \bb{Y}=\bb{y})= \begin{cases} \int_A f_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x})d\bb{x} & \bb{X} \text{ is continuous}\\ \sum_{\bb{x}\in A} p_{\bb{X} \c \bb{Y}=\bb{y}}(\bb{x}) & \bb{X} \text{ is discrete} \end{cases}. \end{align*}

Example 4.5.1. \begin{align*} F_{X_2 \c X_1=x_1,X_3=x_3}(x_2) = \begin{cases}\frac{\P( X_1=x_1, X_2\leq x_2, X_3=x_3)}{\sum_{x_2} p_{\bb{X}}(X_1=x_1, X_2=x_2, X_3=x_3)}& \bb{X} \text{ is discrete}\\ \frac{\P( X_1=x_1, X_2\leq x_2, X_3=x_3)}{\int f_{\bb{X}}(X_1=x_1, X_2=x_2,X_3=x_3)\,dx_2} & \bb{X} \text{ is continuous} \end{cases} \end{align*}

Example 4.5.2. \begin{align*} f_{X_i \c \{X_j=x_j:j\neq i\}}(x_i) &= \frac{\frac{d}{dx_i} \int_{-\infty}^{x_i} f_{X_1,\ldots,X_n}(x_1,\ldots,x_n)dx_i}{ f_{X_1,\ldots,X_{i-1},X_{i+1},\ldots,X_n}(x_1,\ldots,x_{i-1},x_{i+1},\ldots,x_n)}\\ &=\frac{f_{X_1,\ldots,X_n}(x_1,\ldots,x_n)}{\int_{-\infty}^{\infty} f_{X_1,\ldots,X_n}(x_1,\ldots,x_n)dx_i}. \end{align*} In the case of $n=3$, we have \begin{align*}f_{X_2 \c X_1=x_1,X_3=x_3}(x_2) &= \frac{d}{d x_2} F_{X_2 \c X_1=x_1,X_3=x_3}(x_2)\\ &=\frac{f_{X_1,X_2,X_3}(x_1,x_2,x_3)}{ \int_{-\infty}^{\infty}f_{X_1,X_2,X_3}(x_1,x_2,x_3) dx_2}.\end{align*}

The above formulas lead to the following generalization of the Bayes rule for events $\P(A \c B)=\P(B \c A)\P(A) / \P(B)$ (Proposition 1.5.2).

Proposition 4.5.1 (Bayes Rule). \begin{align*} f_{\bb X}(\bb X) &= f_{X_i \c \{X_j=x_j:j\neq i\}}(x_i) f_{X_1,\ldots,X_{i-1},X_{i+1},\ldots,X_n}(x_1,\ldots,x_{i-1}, x_{i+1}, \ldots, x_n)\\ p_{\bb X}(\bb X) &= p_{X_i \c \{X_j=x_j:j\neq i\}}(x_i) p_{X_1,\ldots,X_{i-1},X_{i+1},\ldots,X_n}(x_1,\ldots,x_{i-1}, x_{i+1}, \ldots, x_n) \end{align*}

Proof. The pdf formula follows from Example 4.5.2. The derivation of the pmf formula is similar.

Corollary 4.5.1. \begin{align*} f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) &= f_{X_1}(x_1) f_{X_2 \c X_1=x_1}(x_2)f_{X_3 \c X_1=x_1,X_2=x_2}(x_3) \cdots f_{X_n \c X_1=x_1,\ldots,X_{n-1}=x_{n-1}}(x_n).\\ p_{X_1,\ldots,X_n}(x_1,\ldots,x_n) &= p_{X_1}(x_1) p_{X_2 \c X_1=x_1}(x_2)p_{X_3 \c X_1=x_1,X_2=x_2}(x_3) \cdots p_{X_n \c X_1=x_1,\ldots,X_{n-1}=x_{n-1}}(x_n). \end{align*}

Proof. The proof follows repeated use of Proposition 4.5.1.

The ordering of $X_1,\ldots,X_n$ in the decomposition above is arbitrary and similar formulas hold when the variables are relabeled. For example replace $X_1$ with $X_2$, $X_2$ with $X_3$, and $X_3$ with $X_1$, or any other arbitrary relabeling (Formally, given a permutation function $\pi:\{1,\ldots,n\}\to\{1,\ldots,n\}$, which is a one-to-one and onto function, a relabeling of the vector $(X_1,\ldots,X_n)$ is the vector $(X_{\pi(1)},\ldots,X_{\pi(n)})$.). For example the following two equations hold. \begin{align*} f_{X_1,X_2,X_3}(\bb x) &= f_{X_1} (x_1) f_{X_2 \c X_1=x_1}(x_2) f_{X_3 \c X_1=x_1,X_2=x_2}(x_3)\\ f_{X_1,X_2,X_3}(\bb x) &= f_{X_2} (x_2) f_{X_3 \c X_2=x_2}(x_3) f_{X_1 \c X_2=x_2,X_3=x_3}(x_1). \end{align*}

Example 4.5.2. Suppose that a point $X$ is chosen from a uniform distribution in the interval $[0,1]$ and that after $X=x$ is observed a point $Y$ is drawn from a uniform distribution on the interval $[x,1]$. We have \begin{align*} f_{X,Y}(x,y) &= f_X(x)f_{Y \c X=x}(x)= \begin{cases} 1\cdot \frac{1}{1-x} & 0 < x < y < 1 \\ 0 & \text{otherwise} \end{cases}\\ f_Y(y)&= \begin{cases} \int_{-\infty}^{\infty}f_{X,Y}(x,y) dx=\int_0^y\frac{1}{1-x}dx=-\log(1-y) & 0 < y < 1 \\ 0 & \text{otherwise} \end{cases}\\ f_{X \c Y=y}(x)&=f_{X,Y}(x,y)/f_{Y}(y)= \begin{cases} \frac{-1}{(1-x)\log(1-y)} & 0 < x < y < 1\\ 0 &\text{otherwise} \end{cases}. \end{align*}