Math and science::INF ML AI

Entropy of an ensemble

The entropy of an ensemble, \( X = (x, A_x, P_x) \), is defined to be the average Shannon information content over all outcomes:

\[ H(X) = \sum_{x \in A_x}P(x) \log \frac{1}{P(x)} \]

Properties of entropy:

The last two points can be expressed as:

\[H(X) \leq \log(|A_x|) \text{, with equality iff all outcomes have probability } \frac{1}{|A_x|} \]

Proof on the back side.

First note that:

\[ \begin{align*} E[\frac{1}{P(X)}] &= \sum_{i \in A_x} P(i) \frac{1}{P(i)} \\ &= \sum_{i \in A_x} 1 \\ &= |A_x| \\ \end{align*} \]

Now then:

\[\begin{align*} H(X) &= \sum_{i \in A_x} P(i) \log\frac{1}{P(i)} \\ &= \sum_{i \in A_x}P(i)f(\frac{1}{P(i)}) \\ \end{align*}\]

Where \( f(u) = \log(u) \) is a concave function. Then, Using Jensen's inequality we can say:

\[\begin{align*} H(X) &= E[f(\frac{1}{P(X)})] \\ &\leq f(E[\frac{1}{P(X)}]) \\ &= \log(|A_x|) \end{align*}\]

If \( X \) is a random variable, we can create another random variable, let's call it \( Y \), by applying a fuction to the outcomes, \( Y = f(X) \). What if this function depended not on the value of the outcome, but on the probability of the outcome? As there is a function P that maps outcomes to probabilities, we could create a random variable \( Y = P(X) \). For Shannon information content and entropy, we create the variable \( Y = \log \frac{1}{P(X)} \). The entropy is the expected value of this random variable.