## Probability

### The Analysis of Data, volume 1

Important Random Vectors: The Exponential Family Random Vector

## 5.5. The Exponential Family Random Vector

Definition 5.1.1. The continuous exponential family random vector is a random vector whose pdf is \begin{align*} f_{\bb X}(\bb x) &= \frac{\alpha(\bb x)}{Z(\bb\theta)} \exp\left(\sum_{i=1}^k h_i(\bb \theta) g_i(\bb x) \right), \quad \bb x \in \mathcal{X}\subset\R^n, \bb \theta\in\Theta\subset \R^d, \end{align*} where \begin{align*} g_i &: \mathcal{X}\to \R, \qquad i=1,\ldots,k\\ h_i &: \Theta \to \R, \qquad i=1,\ldots,k\\ \alpha &:\mathcal{X}\to\R \end{align*} and $Z$ ensures that the pdf normalizes to one \begin{align*} Z(\bb\theta) = \int_{\mathcal{X}} \alpha(\bb x) \exp\left(\sum_{i=1}^k h_i(\bb \theta) g_i(\bb x) \right)\, d\bb x. \end{align*} The discrete analog is defined similarly by replacing the pdf by a pmf and the integral by a sum. If $h_i(\bb\theta)=\theta_i$ we say that the exponential family has a canonical form.

We make the following observations.

1. In order for the definition above to make sense we need the integral or sum in $Z$ to converge.
2. The function $h$ needs to be discrete ($h(\bb x)>0$ on a countable set of $\bb x$ values) in order for $\bb X$ to be a discrete random vector and continuous ($h(\bb x)>0$ on non-countable sets of $\bb x$ values) in order for $\bb X$ to be a continuous random vector.
3. When the exponential family is in canonical form, we have the following simple connection between $\log Z$ and the expectation and variance of $\bb X$ \begin{align*} \E(g_j(\bb X)) &= \int g_j(\bb x) \frac{\alpha(\bb x)}{Z(\bb\theta)} \exp\left(\sum_{i=1}^k h_i(\bb \theta) g_i(\bb x) \right) \, d\bb x\\ &= \frac{\int g_j(\bb x) \alpha(\bb x) \exp\left(\sum_{i=1}^k h_i(\bb \theta) g_i(\bb x) \right) \, d\bb x}{Z(\bb\theta)}\\ &= - \frac{\partial \log Z(\bb\theta)}{\partial\theta_j}\\ \Cov(g_j(X), g_k(X)) &= \E(g_j(X) g_k(X)) - \E(g_j(X))\E(g_k(X)) \\ &= \frac{\partial^2 Z}{\partial\theta_j\partial\theta_k}/Z - \left(\frac{\partial Z}{\partial \theta_j} / Z\right) \left(\frac{\partial Z}{\partial \theta_k}/Z\right) \\ &= - \left(\frac{\partial Z}{\partial \theta_j}\frac{\partial Z}{\partial \theta_k} - \frac{\partial^2 Z}{\partial\theta_j\partial\theta_k } Z\right) / Z^2\\ &= -\frac{\partial^2 \log Z(\bb\theta)}{\partial\theta_j\partial\theta_k}. \end{align*} (Replace integrals with sums in the discrete case.) Or in vector notation \begin{align*} \E(\bb g(\bb X)) &= -\nabla \log Z(\bb\theta)\\ \Var(\bb g(\bb X)) &= - \nabla^2 \log Z(\bb\theta). \end{align*} Note that since the variance matrix is non-negative definite, the last equation implies that the functions $\log f_{\bb X}(\bb x)$ or $\log p_{\bb X}(\bb x)$, when viewed as functions of $\bb\theta$ for an arbitrary fixed $\bb x$, are concave functions. This motivates the use of exponential family in statistical estimation using the maximum likelihood procedure.

Many of the important random variables and random vectors that we have seen so far can be re-expressed in a way that complies with Definition 5.5.1 and thus are exponential family random vectors (or random variables). Specific cases include the binomial, Gaussian, exponential, Poisson, beta, and gamma random variables and the multinomial, multivariate normal, and Dirichlet random vectors. Notable exceptions are uniform, $t$, and mixture random variables and the corresponding random vectors.