Probability

The Analysis of Data, volume 1

Important Random Variables: Notes

3.16. Notes

In modeling situations, it may seem difficult to select an appropriate distribution among the large variety of possible candidates. The potential candidates may be narrowed by focusing on the domain of the set of values the distribution generates.

The Bernoulli distribution is appropriate for modeling $\{0,1\}$ values. The binomial and hyper-geometric distributions are appropriate for modeling $\{0,1,\ldots,n\}$ values. The Poisson and geometric distributions are appropriate for modeling non-negative integers. For continuous data, the beta distribution is appropriate for modeling continuous values in $[0,1]$ (or some other range if appropriate shifting and scaling are performed). The gamma and exponential distributions is appropriate for modeling continuous non-negative values (exponential for strictly decreasing distributions and gamma for potentially unimodal distributions). The Gaussian and $t$ distributions are appropriate for modeling continuous real-valued data (Gaussian distribution for exponentially decaying distribution and $t$ distribution for polynomially decaying distributions).

The canonical distributions above have pdfs with simple shapes, for example monotonic increasing, monotonic decreasing, or unimodal. In cases where the required distribution exhibits more complex shapes, we can select an appropriate mixture distribution. As the number of mixture components is increased, more flexibility is obtained (at the price of an increase in the number of parameter).

A comprehensive description of these and other distributions is available in manuscripts specializing in distributions, for example (Forbes, 2010).