The Dirichlet distribution is a fundamental probability distribution in statistics, particularly in Bayesian inference where it serves as the conjugate prior for the multinomial distribution.
In this post, we explore its definition, geometry, and behavior using interactive simulations.
What is a "Random Distribution"?
To understand the Dirichlet, it helps to start with an analogy.
Imagine a standard six-sided die. Rolling it produces a number from 1 to 6. But physical dice are never perfectly fair; tiny manufacturing imperfections mean the probabilities of each side aren't exactly .
Now, imagine a bag of 100 dice.
- A bag of crude, handmade dice from 100 years ago might have wildly different biases.
- A bag of precision casino dice will have probabilities very close to uniform.
If you reach into the bag and pull out a die, you are drawing a random probability mass function (pmf). The Dirichlet distribution is a way to model this randomness—it is a distribution over pmfs.
Definition and Geometry
A probability mass function has two constraints:
- All probabilities are non-negative: .
- They sum to one: .
Geometrically, these constraints restrict to a shape called the -dimensional probability simplex, denoted .
For (e.g., a 3-sided die), the simplex is an equilateral triangle living in 3D space. Every point inside this triangle represents a valid set of probabilities .
- The vertices represent deterministic outcomes (e.g., ).
- The center represents the uniform distribution .
The Density Function
A random vector follows a Dirichlet distribution with parameter vector (where ) if its probability density is given by:
Visualizing the Density (Figure 1)
The parameter controls the shape of the distribution. The simulation below replicates Figure 1 from our reference paper. It shows the "heat" (probability density) on the simplex triangle for different values of .
- Red indicates high density (high probability of drawing such a pmf).
- Blue indicates low density.
Density Heatmap (Red = High, Blue = Low) on the Simplex
Try the different presets to see the behavior described in the text:
- : The density is perfectly flat (uniform). Every valid pmf is equally likely.
- : The density concentrates in the center. This models the "precision casino dice"—most dice drawn from this bag will be nearly fair.
- : The density explodes at the corners (vertices). This models a bag of "trick dice"—most dice will almost always roll a 1, or almost always a 2, etc., but rarely a mix.
- : The distribution is asymmetric, pulling the mass toward the third component (), which corresponds to the corner for .
Generating Samples (Figure 2)
We can also visualize the distribution by drawing random samples from it. Each blue dot in the simulation below represents one random pmf drawn from . This replicates Figure 2 from the text.
2000 Random Samples on the Simplex
How to Sample?
A robust way to generate these samples is using Gamma random variables. If we draw independent values , then the normalized vector:
follows a Dirichlet distribution.
Properties
- Expectation: The mean of the distribution is the normalized parameter vector: .
- Aggregation: If you sum two components (e.g., ), the resulting distribution is still Dirichlet (with parameters ). This "fractal-like" property makes the Dirichlet distribution incredibly useful for consistent modeling across different levels of granularity.