Loading [MathJax]/jax/output/HTML-CSS/jax.js

Probability

The Analysis of Data, volume 1

Important Random Vectors: The Dirichlet Random Vector

5.3. The Dirichlet Random Vector

Definition 5.3.1. The Dirichlet random vector XDir(α), where αRk with αi0 for all i=1,,k, is a continuous random vector, defined by Xi=Wiki=1Wi,i=1,,k, where WiGam(αi,1), i=1,,k are independent Gamma RVs (see Section 3.10).
Proposition 5.3.1. The Dirichlet random vector XDir(α) has the following pdf fX(x)=Γ(ki=1αi)ki=1Γ(αi)ki=1xαi1i if xPk and fX(x)=0 otherwise.
Proof. We can obtain the pdf by applying the change of variable technique to the following pdf of the random vector W fW(w)=ki=1ewiwαi1i/Γ(αi), and the invertible transformation W(X1,,Xk1,Z) where Xi=Wiki=1Wi and Z=ki=1Wi. We can express the transformation as Wi=XiZ for i=1,,k1 and Wk=ZXk=Z(1X1Xk1). It follows that the Jacobian of the transformation is the following matrix J=(Z0X10ZX2ZZXk).

To use the change of variables technique, we need to compute the determinant of the matrix above. Recalling that adding one of the rows of the matrix to another row does not change the determinant, we can add the first k1 rows to the last row, and then compute the determinant of the resulting matrix. That matrix is an upper diagonal matrix (all elements below the diagonal are zero) with diagonal elements Z,Z,,Z,1. Since the determinant of an upper diagonal matrix is the product of the diagonal terms (all other terms in the product defining the determinant expression zero out), the determinant is Zk1. It follows that the pdf of the transformed variables (X1,,Xk1,Z) is fX1,,Xk1,Z(x1,,xk1,z)=zk1ki=1exiz(xiz)αi1/Γ(αi)=zαi1e1zki=1xαi1i/Γ(αi).

We integrate the above expression with respect to z in order to obtain the pdf of X1,,Xk1. fX1,,Xk1(x1,,xk1)=ki=1xαi1i/Γ(αi)0zαi1e1zdz=ki=1xαi1i/Γ(αi)Γ(ki=1αi) where the last equation follows from the fact that Gamma pdf integrates to one (see Section 3.10). Since ki=1Xi=1, the RV Xk must equal 1k1i=1Xi. The pdf above therefore remains the same if we add the RV Xk taking value 1k1i=1Xi (the pdf becomes zero for all other values of Xk).

Recall that the simplex Pk represents the set of all probability functions on a sample space Ω of size k. The Dirichlet distribution may thus be viewed as a distribution over distributions over a finite sample space.

Proposition 5.3.2. Let WiGam(αi,1), i=1,,k be independent Gamma RVs. The random vector (X1,,Xk1), where Xi=Wi/ki=1Wi for i=1,,k2 and Xk1=(Wk1+Wk)/ki=1Wi is a Dirichlet random vector with parameter (α1,,αk2,αk1+αk).
Proof. Recall from Section 3.10 that a sum of independent Gamma RVs with the same second parameter is a Gamma RV whose parameter is the sum of the corresponding parameters. The rest of the proof follows from the definition of the Dirichlet distribution above.

It is straightforward to generalize the proposition above and show that whenever we aggregate some of the Gamma Wi RVs in categories, and then form the normalization construction that defines the Dirichlet random vector, we obtain a Dirichlet random vector with a parameter vector that is similarly aggregated.

Proposition 5.3.3. Let X be a Dirichlet random vector with parameter vector α. The marginal distribution of Xi is Beta(αi,kj=1αjαi), and we have E(Xi)=αikj=1αjVar(Xi)=αi(kj=1αjαi)(kj=1αj)3(kj=1αj)2.
Proof. Let WiGam(αi,1), i=1,,k be the Gamma RVs underlying the Dirichlet construction of X. The proposition states that the Dirichlet construction corresponding to the Gamma RVs Wi and j:jiWj results in a random vector (Xi,1Xi) that have a Dirichlet distribution with a parameter vector (αi,j:jiαj). Comparing the pdf of Dir(αi,j:jiαj) to the beta distribution pdf, and noting that the distribution of Xi is directly obtained from the distribution of (Xi,1Xi) (the second RV is a function of the first and therefore it is a deterministic quantity) concludes the first part of the proof. The second part of the proof follows from the derivations of the expectation and variance of the beta distribution (see Section 3.12).

The R code below graphs the pdf of several Dirichlet distributions. Dark shades correspond to high probability and light shades correspond to low probabilities. Note that the pdf assigns non-zero values only to the simplex Pk (in this case k=2).

library(MCMCpack)
trellis.par.set(list(fontsize = list(text = 18)))
R = expand.grid(seq(0.001, 0.999, length.out = 80),
    seq(0.001, 0.999, length.out = 80))
R[, 3] = 1 - R[, 1] - R[, 2]
names(R) = c("x1", "x2", "x3")
R = R[R[, 3] > 0, ]
for (i in 1:nrow(R)) R$p[i] = ddirichlet(R[i, 1:3],
    alpha = c(3, 3, 3))
cloud(x3 ~ x1 * x2, data = R, col = gray(1 - R$p/(max(R$p) +
    3) - 0.1), pch = 16, cex = 1, main = "Dirichlet (3,3,3) pdf ",
    scales = list(arrows = FALSE), xlab = "$x_1$",
    ylab = "$x_2$", zlab = "$x_3$")
plot of chunk irvec31
for (i in 1:nrow(R)) R$p[i] = ddirichlet(R[i, 1:3],
    alpha = c(10, 10, 10))
cloud(x3 ~ x1 * x2, data = R, col = gray(1 - R$p/(max(R$p) +
    3) - 0.1), pch = 16, cex = 0.8, main = "Dirichlet (10,10,10) pdf ",
    scales = list(arrows = FALSE), xlab = "$x_1$",
    ylab = "$x_2$", zlab = "$x_3$")
plot of chunk irvec31
for (i in 1:nrow(R)) R$p[i] = ddirichlet(R[i, 1:3],
    alpha = c(1, 3, 6))
cloud(x3 ~ x1 * x2, data = R, col = gray(1 - R$p/(max(R$p) +
    3) - 0.1), pch = 16, cex = 0.8, main = "Dirichlet (1,3,6) pdf",
    scales = list(arrows = FALSE), xlab = "$x_1$",
    ylab = "$x_2$", zlab = "$x_3$")
plot of chunk irvec31