Random Vectors: Functions of a Random Vector

Probability

The Analysis of Data, volume 1

0
- Front Matter
- 0.1: Contents
- 0.2: Preface
1
2
- Random Variables
- 2.1: Basic Definitions
- 2.2: Functions of RVs
- 2.3: Expectation and Variance
- 2.4: Moments and MGF
- 2.5: RVs and Measure Theory
- 2.6: Notes
- 2.7: Exercises
3
4
5
- Important Vectors
- 5.1: Multinomial Vectors
- 5.2: Gaussian Vectors
- 5.3: Dirichlet Vectors
- 5.4: Mixture Vectors
- 5.5: Exponential Family
- 5.6: Notes
- 5.7: Exercises
6
- Random Processes
- 6.1: Basic Definitions
- 6.2: Marginals
- 6.3: Moments
- 6.4: Random Walk
- 6.5: Processes and Measure
- 6.6: Borell-Cantelli and Zero-One
- 6.7: Notes
- 6.8: Exercises
7
- Important RPs
- 7.1: Markov Chains
- 7.2: Poisson Process
- 7.3: Gaussian Process
- 7.4: Notes
- 7.5: Exercises
8
A
- Set Theory
- A.1: Basic Definition
- A.2: Functions
- A.3: Cardinality
- A.4: Limits of Sets
- A.5: Notes
- A.6: Exercises
B
- Metric Spaces
- B.1: Basic Definitions
- B.2: Limits
- B.3: Continuity
- B.4: Euclidean Space
- B.5: Growth of Functions
- B.6: Notes
- B.7: Exercises
C
- Linear Algebra
- C.1: Basic Definitions
- C.2: Rank
- C.3: Eigenvalues and Determinant
- C.4: Semidefinite Matrices
- C.5: SVD
- C.6: Notes
- C.7: Exercises
D
- Differentiation
- D.1: Scalar Differentiation
- D.2: Power and Taylor Series
- D.3: Notes
- D.4: Exercises
E
- Measure Theory
- E.1: Sigma Algebras
- E.2: Measure Function
- E.3: Extension Theorem
- E.4: Independence
- E.5: Important Measures
- E.6: Measurable Functions
- E.7: Notes
F

$ \def\P{\mathsf{\sf P}} \def\E{\mathsf{\sf E}} \def\Var{\mathsf{\sf Var}} \def\Cov{\mathsf{\sf Cov}} \def\std{\mathsf{\sf std}} \def\Cor{\mathsf{\sf Cor}} \def\R{\mathbb{R}} \def\c{\,|\,} \def\bb{\boldsymbol} \def\diag{\mathsf{\sf diag}} \def\defeq{\stackrel{\tiny\text{def}}{=}} $

4.4. Functions of a Random Vector

Recall that when $X$ is a random variable and $g:\mathbb{R}\to\mathbb{R}$ is a real valued function then $g(X)$ is also a random variable and its cdf, pdf or pmf are directly related to the corresponding functions of $X$ (Chapter 2). The same holds for a random vector. Specifically, for a random vector $\bb{X}=(X_1, \ldots, X_n)$ and \[g=(g_1, \ldots, g_k) : \R^n\to\R^k, \qquad g_i:\mathbb{R}^n \to \mathbb{R}, \quad i=1,\ldots,k,\] we have a new $k$-dimensional random vector $\bb{Y}=g(\bb{X})$ with \[Y_i=g_i(X_1,\ldots,X_n),\quad i=1,\ldots,k.\] Figure 4.4.1 illustrates this concept.

Function of a random vector as a mapping from sample space to the Euclidean space

Figure 4.4.1: A random vector $\bb{X}=(X_1,X_2)$ and $g:\mathbb{R}^2\to\mathbb{R}^2$ define a new random vector $\bb Y=g(\bb X)$ that is a mapping from $\Omega$ to $\mathbb{R}^2$.

As in the case of random variables, we consider several techniques for relating the cdf, pmf, or pdf of $g(\bb X)$ to that of $\bb X$.

The first technique is to compute the cdf $F_{\bb{Y}}(\bb{y})$, for all $\bb{y}\in\mathbb{R}^k$. When $k=1$, $\bb{Y}=Y=g(X_1,\ldots,X_n)$ and \begin{align*} F_Y(y)=\P(g(X_1,\ldots,X_n)\leq y) =\begin{cases} \int_{\bb{x}:g(\bb{x})\leq y} f_{\bb{X}}(\bb{x})d\bb{x} & \bb{X} \text{ is continuous }\\ \sum_{\bb{x}:g(\bb{x})\leq y} p_{\bb{X}}(\bb{x}) & \bb{X} \text{ is discrete }\\ \end{cases}. \end{align*} When $k > 2$, \begin{align*} F_{Y_1,\ldots,Y_k}(y_1,\ldots, y_k) &= \P(g_1(\bb{X})\leq y_1,\ldots,g_k(\bb{X})\leq y_1) \\ &=\begin{cases}\int_A f_{\bb{X}}(\bb{x}) \, d\bb{x} & \bb{X} \text{ is continuous}\\ \sum_{\bb{x}\in A} p_{\bb{X}}(\bb{x}) & \bb{X} \text{ is discrete} \end{cases} \end{align*} where \[A=\{\bb{x}\in\mathbb{R}^n: g_j(\bb{x})\leq y_j \text{ for } j=1,\ldots,k\}.\]
If $\bb{Y}$ is discrete we can find its pmf by \begin{align*} p_{\bb{Y}}(y_1,\ldots,y_k)&=\P(g_1(\bb{X})= y_1,\ldots, g_k(\bb{X})= y_k) \\&=\begin{cases}\int_A f_{\bb{X}}(\bb{x}) \, d\bb{x} & \bb{X} \text{ is continuous}\\ \sum_{\bb{x}\in A} p_{\bb{X}}(\bb{x}) & \bb{X} \text{ is discrete} \end{cases} \end{align*} where \[A=\{\bb{x}\in\mathbb{R}^n: g_j(\bb{x})= y_j \text{ for all } j=1,\ldots,k\}.\] Note that if $g$ is one-to-one the set $A$ consists of a single element.
If $\bb{Y}$ is continuous we can obtain the pdf $f_{\bb{Y}}$ by differentiating the joint cdf (if it is available) \[f_{\bb{Y}}(\bb{y})=\frac{\partial^k}{\partial y_1\cdots \partial y_k } F_{\bb{Y}}(\bb{y})\] or using the change of variable technique assuming that $k=n$ (see below).
Proposition 4.4.1. For an $n$-dimensional continuous random vector $\bb{X}$ and $\bb{Y}=g(\bb{X})$ with an invertible and differentiable $g:\R^n\to\R^n$ (in the range of $\bb{X}$), \[ f_{\bb{Y}}(\bb{y}) \cdot |\det J(g^{-1}(\bb{y}))| = f_{\bb{X}}(g^{-1}(\bb{y}))\] where $J(g^{-1}(\bb{y}))$ is the Jacobian matrix at $\bb{x}=g^{-1}(\bb{y})$: \begin{align*} J(\bb{x})=\begin{pmatrix} \frac{\partial g_1}{\partial x_1}(\bb{x})& \cdots & \frac{\partial g_1}{\partial x_n}(\bb{x})\\ \vdots & \vdots & \vdots\\ \frac{\partial g_n}{\partial x_1}(\bb{x})& \cdots & \frac{\partial g_n}{\partial x_n}(\bb{x})\\ \end{pmatrix}. \end{align*}

Proof. See Proposition F.6.3.
The third technique uses the moment generating function. See the corresponding technique in Chapter 2 and the generalization of the moment generating function to random vectors at the end of this chapter.

The following example illustrates the second method above in the important special case of an invertible linear transformation (see Chapter C).

Example 4.4.1. If $g:\R^n\to\R^n$ is a linear transformation expressed by an invertible matrix $T$ of size $n\times n$, then $g(\bb{x})=T\bb{x}$ and $g^{-1}(\bb{y})=T^{-1}\bb{y}$. Since the Jacobian $J(g^{-1}(\bb{y}))=T$, \[ f_{\bb{Y}}(\bb{y}) = |\det T|^{-1} f_{\bb{X}}(T^{-1}\bb{y}).\]

The example above leads to a general technique for finding the pdf or pmf of a sum of independent random variables. This is explored in the example below. Proposition 4.4.3 shows how to obtain the same result using the convolution operator.

Example 4.4.2. Consider an independent random vector $\bb X=(X_1, X_2)$ and the mapping $\bb X\mapsto \bb Y$ defined by $Y_1=X_1$, $Y_2=X_1+X_2$ or in matrix notation $g(\bb x)=A\bb x$ for $A=\begin{pmatrix} 1&0\\1&1\end{pmatrix}$. Note that the determinant of $A$ is 1 and that the inverse mapping is $X_1=Y_1$ and $X_2=Y_2-Y_1$. We therefore have \[f_{Y_1,Y_2}(y_1,y_2)=\frac{1}{|1|} f_{X_1,X_2}(y_1,y_2-y_1) = f_{X_1}(y_1) f_{X_2}(y_2-y_1).\] Integrating both sides of the equations above with respect to $y_1$ gives \begin{align*} f_{X_1+X_2}(z) &= \int f_{X_1}(t) \,f_{X_2}(z-t) \, dt. \end{align*}

The example below illustrates the second method of finding the distribution of $g(\bb X)$ in the case of a non-linear transformation $g$.

Example 4.4.3. For two independent exponential RVs $X_1,X_2$ (with parameter $\lambda)$ and $\bb Y=g(\bb X)$ with $g_1(x_1,x_2)=x_1/(x_1+x_2)$, $g_2(x_1,x_2)=x_1+x_2$ (the inverse transformation is $g^{-1}_1(y_1,y_2)=y_1y_2$, $g^{-1}_2(y_1,y_2)=(1-y_1)y_2$) the Jacobian is \begin{align*} J(x_1,x_2) &= \begin{pmatrix} \frac{x_2}{(x_1+x_2)^2} & - \frac{x_1}{(x_1+x_2)^2}\\ 1 & 1 \end{pmatrix}\\ \det J (x_1,x_2) &= \frac{x_2}{(x_1+x_2)^2}+\frac{x_1}{(x_1+x_2)^2}=\frac{1}{x_1+x_2}, \end{align*} and \begin{align*} f_{Y_1,Y_2}(y_1,y_2) = |(y_1y_2+(1-y_1)y_2)| \,\,\lambda e^{-\lambda y_1y_2} \lambda e^{-\lambda (1-y_1)y_2} =y_2\lambda^2 e^{-\lambda y_2} \end{align*} for $0 < y_1 < 1$, $0 < y_2$ and 0 otherwise.

Definition 4.4.1. The convolution of two functions $f,g:\mathbb{R}\to\mathbb{R}$ is the function \[(f*g)(z)=\begin{cases} \sum_t f(t) g(z-t) & f,g \text{ are discrete functions}\\ \int_{-\infty}^{\infty} f(t)g(z-t)\,dt & f,g \text{ otherwise} \end{cases}.\] (a function $f$ is discrete if $f(x)=0$ except on a finite or countable set of $x$ values.)

The parenthesis are often omitted resulting in the notation $f*g$.

Proposition 4.4.2. The convolution is commutative: $(f*g)(t)=(g*f)(t)$ and associative: $((f*g)*h)(t)=(f*(g*h))(t)$.

Proof. In the continuous case we have \begin{align*} (f*g)(z) &=\int_{-\infty}^{\infty} f(t) g(z-t)\,dt = -\int_{\infty}^{-\infty} f(z-t')g(t') \,dt' \\ &=\int_{-\infty}^{\infty} g(t')f(z-t')\,dt' = (g*f)(z). \end{align*} The proof in the discrete case is similar. The proof of associativity is along similar lines but more tedious.

The convolution's associativity justifies leaving out the parenthesis, for example we write $(f*g*h)(t)$ instead of $(f*(g*h))(t)$.

Proposition 4.4.3. If $\bb X=(X_1,\ldots,X_n)$ is an independent random vector, then \begin{align*} f_{\sum_{i=1}^n X_i}(y) &=f_{X_1}*\cdots * f_{X_n}(y) \qquad \bb X \text{ is continuous}\\ p_{\sum_{i=1}^n X_i}(y) &=p_{X_1}*\cdots * p_{X_n}(y) \qquad \bb X \text{ is discrete}. \end{align*}

Proof. When $n=2$ and $\bb{X}$ is continuous, the cdf of $Y=X_1+X_2$ is \[F_Y(y)=\P(X_1+X_2\leq y)=\int_{-\infty}^{\infty}\int_{-\infty}^{y-w} f_{X_1,X_2}(w,z)\,\,dzdw\] (since $y=z+w$) and by the fundamental theorem of calculus (Section F.2) \[ f_Y(y) = \frac{d}{dy}F_Y(y) = \int_{-\infty}^{\infty} f_{X_1,X_2}(w,y-w)\, dw = f_{X_1} * f_{X_2}(y),\] where the last equality follows from the independence of $X_1$ and $X_2$. By induction we have that $f_{X_1+\ldots+X_n}(y)=(f_{X_1}*\cdots * f_{X_n})(y)$. The proof in the case of a discrete $\bb{X}$ is similar.