Loading [MathJax]/jax/output/HTML-CSS/jax.js

Probability

The Analysis of Data, volume 1

Random Vectors: Conditional Probabilities and Random Vectors

4.5. Conditional Probabilities and Random Vectors

Conditional probabilities for random vectors are defined similarly to the scalar case. Considering a joint distribution over the random vector Z=(X,Y), the conditional probability P(XA|Y=y) reflects an updated likelihood for the event XA given that Y=y.

The conditional cdf, pdf, and pmf are defined as follows FX|Y=y(x)={P(Xx,Y=y)/pY(y)Y is discreteP(Xx,Y=y)/fY(y)Y is continuousfX|Y=y(x)=nx1xnFX|Y=y(x)pX|Y=y(x)=P(X=x,Y=y)P(Y=y)=pX,Y((x,y))pY(y). Note that we assume above that fY(y) and pY(y) are not zero.

When both X and Y are continuous their joint cdf is differentiable and fX|Y=y(x)=fX,Y(x,y)fY(y).

Computing conditional probabilities from the conditional pdf and pmf proceeds as in the non-conditional case, by integrating over the corresponding pdf or summing over the corresponding pmf. The proof is similar to the scalar case (see Chapter 2).

P(XA|Y=y)={AfX|Y=y(x)dxX is continuousxApX|Y=y(x)X is discrete.

Example 4.5.1. FX2|X1=x1,X3=x3(x2)={P(X1=x1,X2x2,X3=x3)x2pX(X1=x1,X2=x2,X3=x3)X is discreteP(X1=x1,X2x2,X3=x3)fX(X1=x1,X2=x2,X3=x3)dx2X is continuous
Example 4.5.2. fXi|{Xj=xj:ji}(xi)=ddxixifX1,,Xn(x1,,xn)dxifX1,,Xi1,Xi+1,,Xn(x1,,xi1,xi+1,,xn)=fX1,,Xn(x1,,xn)fX1,,Xn(x1,,xn)dxi. In the case of n=3, we have fX2|X1=x1,X3=x3(x2)=ddx2FX2|X1=x1,X3=x3(x2)=fX1,X2,X3(x1,x2,x3)fX1,X2,X3(x1,x2,x3)dx2.

The above formulas lead to the following generalization of the Bayes rule for events P(A|B)=P(B|A)P(A)/P(B) (Proposition 1.5.2).

Proposition 4.5.1 (Bayes Rule). fX(X)=fXi|{Xj=xj:ji}(xi)fX1,,Xi1,Xi+1,,Xn(x1,,xi1,xi+1,,xn)pX(X)=pXi|{Xj=xj:ji}(xi)pX1,,Xi1,Xi+1,,Xn(x1,,xi1,xi+1,,xn)
Proof. The pdf formula follows from Example 4.5.2. The derivation of the pmf formula is similar.
Corollary 4.5.1. fX1,,Xn(x1,,xn)=fX1(x1)fX2|X1=x1(x2)fX3|X1=x1,X2=x2(x3)fXn|X1=x1,,Xn1=xn1(xn).pX1,,Xn(x1,,xn)=pX1(x1)pX2|X1=x1(x2)pX3|X1=x1,X2=x2(x3)pXn|X1=x1,,Xn1=xn1(xn).
Proof. The proof follows repeated use of Proposition 4.5.1.

The ordering of X1,,Xn in the decomposition above is arbitrary and similar formulas hold when the variables are relabeled. For example replace X1 with X2, X2 with X3, and X3 with X1, or any other arbitrary relabeling (Formally, given a permutation function π:{1,,n}{1,,n}, which is a one-to-one and onto function, a relabeling of the vector (X1,,Xn) is the vector (Xπ(1),,Xπ(n)).). For example the following two equations hold. fX1,X2,X3(x)=fX1(x1)fX2|X1=x1(x2)fX3|X1=x1,X2=x2(x3)fX1,X2,X3(x)=fX2(x2)fX3|X2=x2(x3)fX1|X2=x2,X3=x3(x1).

Example 4.5.2. Suppose that a point X is chosen from a uniform distribution in the interval [0,1] and that after X=x is observed a point Y is drawn from a uniform distribution on the interval [x,1]. We have fX,Y(x,y)=fX(x)fY|X=x(x)={111x0<x<y<10otherwisefY(y)={fX,Y(x,y)dx=y011xdx=log(1y)0<y<10otherwisefX|Y=y(x)=fX,Y(x,y)/fY(y)={1(1x)log(1y)0<x<y<10otherwise.