Perplexity: a more intuitive measure of uncertainty than entropy

2 minute read

Published: October 08, 2021

Like entropy, perplexity is an information theoretic quantity that describes the uncertainty of a random variable. In fact, perplexity is simply a monotonic function of entropy and thus, in some sense, they can be used interchangeabley. So why do we need it? In this post, I’ll discuss why perplexity is a more intuitive measure of uncertainty than entropy.

Introduction

Perplexity is an information theoretic quantity that crops up in a number of contexts such as natural language processing and is a parameter for the popular t-SNE algorithm used for dimensionality reduction.

Like entropy, perplexity provides a measure of the amount of uncertainty of a random variable. In fact, perplexity is simply a monotonic function of entropy. Given a discrete random variable, $X$, perplexity is defined as:

\[\text{Perplexity}(X) := 2^{H(X)}\]

where $H(X)$ is the entropy of $X$.

When I first saw this definition, I did not understand its purpose. That is, if perplexity is simply exponentiated entropy, why do we need it? After all, we have a good intuition for entropy already: it describes the number of bits needed to encode random samples from $X$’s probability distribution. So why perplexity?

An intuitive measure of uncertainty

Perplexity is often used instead of entropy due to the fact that it is arguably more intuitive to our human minds than entropy. Of course, as we’ve discussed in a previous blog post, entropy describes the number of bits needed to encode random samples from a distribution, which one may argue is already intuitive; however, I would argue the contrary. If I tell you that a given random variable has an entropy of 7, how should you feel about that at a gut level?

Arguably, perplexity provides a more human way of thinking about the random variable’s uncertainty and that is because the perplexity of a uniform, discrete random variable with K outcomes is K (see the Appendix to this post)! For example, the perplexity of a fair coin is two and the perplexity of a fair six-sided die is six. This provides a frame of reference for interpreting a perplexity value. That is, if the perplexity of some random variable X is 20, our uncertainty towards the outcome of X is equal to the uncertainty we would feel towards a 20-sided die. This helps intuit the uncertainty at a more gut level!

Appendix

Theorem: Given a discrete uniform random variable $X \sim \text{Cat}(p_1, p_2, \dots, p_K)$ where $\forall i,j \in [K], p_i = p_j = 1/K$, it holds that the perplexity of $X$ is $K$.

Proof:

\[\begin{align*} \text{Perplexity}(X) &:= 2^{H(X)} \\ &= 2^{\frac{1}{K} -\sum_{i=1}^K \log_2 \frac{1}{K}} \\ &= 2^{-\log_2 \frac{1}{K}} \\ &= \frac{1}{2^{\log_2 \frac{1}{K}}} \\ &= K \end{align*}\]

$\square$

Matthew N. Bernstein

Perplexity: a more intuitive measure of uncertainty than entropy

Introduction

An intuitive measure of uncertainty

Appendix

You May Also Enjoy

A framework for making sense of metrics in technical organizations

Understanding attention

Demystifying Euler’s number

Reproducing kernel Hilbert spaces and the kernel trick