Sampling Dirichlet Distributions (chapter 2-a)

In a previous post on sampling from Dirichlet distributions that related to a comment by Andrew Gelman on the same topic I talked a bit about how to sample the parameters of a Dirichlet (as opposed to sampling from a Dirichlet distribution with known parameters).

In responding to some questions and comments on my post, I went back and reread Andrew's thoughts and think that they should be amplified a bit.

Basically, what Andrew suggested is that when sampling from a Dirichlet, it is much easier to not sample from Dirichlet at all, but rather to sample from some unbounded distribution and then reduce that sample back to the unit simplex. The transformation that Andrew suggested is the same as the so-called soft-max basis that David Mackay also advocates for several purposes. This method is not well know, however, so it deserves a few more pixels.

The idea is that to sample $(\pi_1 \ldots \pi_n)$ from some distribution you instead sample

$(x_1 \ldots x_n) \sim \mathrm{Dir} (\beta_1 \ldots \beta_n)$

and then reduce to the sample you want

$\pi_i = \frac {e^{x_i}} {\sum_j e^{x_j}}$

Clearly this gives you values on the unit simplex. What is not clear is that this distribution is anything that you want. In fact, if your original sample is from a normal distribution

$(x_1 \ldots x_n) \sim \mathcal N (\vec \mu ,\Sigma)$

then you can come pretty close to whatever desired Dirichlet distribution you like. More importantly in practice, this idea of normally sampling $(x_1 \ldots x_n)$ and then transforming gives a very nice prior serves as well as a real Dirichlet distribution in many applications.

Where this comes in wonderfully handy is in MCMC sampling where you don't really want to sample from the prior, but instead want to sample from the posterior. It isn't all that hard to use the Dirichlet distribution directly in these cases, but the results are likely to surprise you. For instance, if you take the trivial case of picking a uniformly distributed point on the simplex and then jumping around using Metropolis updates based on a Dirichlet distribution, you will definitely have the right distribution for your samples after just a bit of sampling. But if you plot those points, they won't look at all right if you are sampling from a distribution that has any significant density next to the boundaries of the simplex. What is happening is that the high probability of points near the boundaries is accounted for in the Metropolis algorithm by having a high probability of rejecting any jump if you are near the boundary. Thus, the samples near the boundary are actually repeated many times leading to the visual impression of low density where there should be high density.

On the other hand, if you were to do the Metropolis jumps in the soft-max basis, the problem is entirely avoided because there are no boundaries in soft-max space. You can even use the soft-max basis for the sole purpose of generating candidate points and do all probability calculations in terms on the simplex.

In my next post, I will provide a concrete example and even include some pictures.

Surprise and Coincidence - musings from the long tail

Tuesday, April 6, 2010