Important Random Vectors: The Exponential Family Random Vector

Probability

The Analysis of Data, volume 1

0
- Front Matter
- 0.1: Contents
- 0.2: Preface
1
2
- Random Variables
- 2.1: Basic Definitions
- 2.2: Functions of RVs
- 2.3: Expectation and Variance
- 2.4: Moments and MGF
- 2.5: RVs and Measure Theory
- 2.6: Notes
- 2.7: Exercises
3
4
5
- Important Vectors
- 5.1: Multinomial Vectors
- 5.2: Gaussian Vectors
- 5.3: Dirichlet Vectors
- 5.4: Mixture Vectors
- 5.5: Exponential Family
- 5.6: Notes
- 5.7: Exercises
6
- Random Processes
- 6.1: Basic Definitions
- 6.2: Marginals
- 6.3: Moments
- 6.4: Random Walk
- 6.5: Processes and Measure
- 6.6: Borell-Cantelli and Zero-One
- 6.7: Notes
- 6.8: Exercises
7
- Important RPs
- 7.1: Markov Chains
- 7.2: Poisson Process
- 7.3: Gaussian Process
- 7.4: Notes
- 7.5: Exercises
8
A
- Set Theory
- A.1: Basic Definition
- A.2: Functions
- A.3: Cardinality
- A.4: Limits of Sets
- A.5: Notes
- A.6: Exercises
B
- Metric Spaces
- B.1: Basic Definitions
- B.2: Limits
- B.3: Continuity
- B.4: Euclidean Space
- B.5: Growth of Functions
- B.6: Notes
- B.7: Exercises
C
- Linear Algebra
- C.1: Basic Definitions
- C.2: Rank
- C.3: Eigenvalues and Determinant
- C.4: Semidefinite Matrices
- C.5: SVD
- C.6: Notes
- C.7: Exercises
D
- Differentiation
- D.1: Scalar Differentiation
- D.2: Power and Taylor Series
- D.3: Notes
- D.4: Exercises
E
- Measure Theory
- E.1: Sigma Algebras
- E.2: Measure Function
- E.3: Extension Theorem
- E.4: Independence
- E.5: Important Measures
- E.6: Measurable Functions
- E.7: Notes
F

$ \def\P{\mathsf{\sf P}} \def\E{\mathsf{\sf E}} \def\Var{\mathsf{\sf Var}} \def\Cov{\mathsf{\sf Cov}} \def\std{\mathsf{\sf std}} \def\Cor{\mathsf{\sf Cor}} \def\R{\mathbb{R}} \def\c{\,|\,} \def\bb{\boldsymbol} \def\diag{\mathsf{\sf diag}} \def\defeq{\stackrel{\tiny\text{def}}{=}} $

5.5. The Exponential Family Random Vector

Definition 5.1.1. The continuous exponential family random vector is a random vector whose pdf is \begin{align*} f_{\bb X}(\bb x) &= \frac{\alpha(\bb x)}{Z(\bb\theta)} \exp\left(\sum_{i=1}^k h_i(\bb \theta) g_i(\bb x) \right), \quad \bb x \in \mathcal{X}\subset\R^n, \bb \theta\in\Theta\subset \R^d, \end{align*} where \begin{align*} g_i &: \mathcal{X}\to \R, \qquad i=1,\ldots,k\\ h_i &: \Theta \to \R, \qquad i=1,\ldots,k\\ \alpha &:\mathcal{X}\to\R \end{align*} and $Z$ ensures that the pdf normalizes to one \begin{align*} Z(\bb\theta) = \int_{\mathcal{X}} \alpha(\bb x) \exp\left(\sum_{i=1}^k h_i(\bb \theta) g_i(\bb x) \right)\, d\bb x. \end{align*} The discrete analog is defined similarly by replacing the pdf by a pmf and the integral by a sum. If $h_i(\bb\theta)=\theta_i$ we say that the exponential family has a canonical form.

We make the following observations.

In order for the definition above to make sense we need the integral or sum in $Z$ to converge.
The function $h$ needs to be discrete ($h(\bb x)>0$ on a countable set of $\bb x$ values) in order for $\bb X$ to be a discrete random vector and continuous ($h(\bb x)>0$ on non-countable sets of $\bb x$ values) in order for $\bb X$ to be a continuous random vector.
When the exponential family is in canonical form, we have the following simple connection between $\log Z$ and the expectation and variance of $\bb X$ \begin{align*} \E(g_j(\bb X)) &= \int g_j(\bb x) \frac{\alpha(\bb x)}{Z(\bb\theta)} \exp\left(\sum_{i=1}^k h_i(\bb \theta) g_i(\bb x) \right) \, d\bb x\\ &= \frac{\int g_j(\bb x) \alpha(\bb x) \exp\left(\sum_{i=1}^k h_i(\bb \theta) g_i(\bb x) \right) \, d\bb x}{Z(\bb\theta)}\\ &= - \frac{\partial \log Z(\bb\theta)}{\partial\theta_j}\\ \Cov(g_j(X), g_k(X)) &= \E(g_j(X) g_k(X)) - \E(g_j(X))\E(g_k(X)) \\ &= \frac{\partial^2 Z}{\partial\theta_j\partial\theta_k}/Z - \left(\frac{\partial Z}{\partial \theta_j} / Z\right) \left(\frac{\partial Z}{\partial \theta_k}/Z\right) \\ &= - \left(\frac{\partial Z}{\partial \theta_j}\frac{\partial Z}{\partial \theta_k} - \frac{\partial^2 Z}{\partial\theta_j\partial\theta_k } Z\right) / Z^2\\ &= -\frac{\partial^2 \log Z(\bb\theta)}{\partial\theta_j\partial\theta_k}. \end{align*} (Replace integrals with sums in the discrete case.) Or in vector notation \begin{align*} \E(\bb g(\bb X)) &= -\nabla \log Z(\bb\theta)\\ \Var(\bb g(\bb X)) &= - \nabla^2 \log Z(\bb\theta). \end{align*} Note that since the variance matrix is non-negative definite, the last equation implies that the functions $\log f_{\bb X}(\bb x)$ or $\log p_{\bb X}(\bb x)$, when viewed as functions of $\bb\theta$ for an arbitrary fixed $\bb x$, are concave functions. This motivates the use of exponential family in statistical estimation using the maximum likelihood procedure.

Many of the important random variables and random vectors that we have seen so far can be re-expressed in a way that complies with Definition 5.5.1 and thus are exponential family random vectors (or random variables). Specific cases include the binomial, Gaussian, exponential, Poisson, beta, and gamma random variables and the multinomial, multivariate normal, and Dirichlet random vectors. Notable exceptions are uniform, $t$, and mixture random variables and the corresponding random vectors.