If \( f \) is a convex (smile) function and \( X \) is a random variable then:
\[ \mathbb{E}[f(X)] \ge f(\mathbb{E}[X]) \]
If \( f \) is strictly convex and \( \mathbb{E}[f(X)] = f(\mathbb{E}[X]) \), then the random variable \( X \) is a constant.
Intuition
A few ways of visualizing Jensen's inequality.
Sense of stretching
Interpolation across \( \mathbb{E}[X] \)
Regardless of the weightings (probabilities) of \( x_1 \) and \( x_2 \), their expectation \( \mathbb{E}[x] \) will lie somewhere between \( x_1 \) and \( x_2 \). Mapping \( x_1 \) and \( x_2 \) through \( f \) and calculating the expectation gives us \( \mathbb{E}[f(X)] \), which will like somewhere on the line between \(f(x_1)\) and \( f(x_2) \). But if we were first to calculate \( \mathbb{E}[X] \) and then pass this through \( f \) to get \( f(\mathbb{E}[X]) \), then this value would be less-equal due to the increasing (convex) nature of \( f \).
A similar visualization with more points (from Mark Reid's blog post):
Three squares have average area \( \bar{A} = 100 m^2 \). The average of the lengths of their sides is \( \bar{l} = 10 m \). What can be said about the size of the largest of the three squares?
A1:
Let \( x \) be the length of the side of a square, and let the probability of \( x \) be \( \frac{1}{3}, \frac{1}{3}, \frac{1}{3} \) over the three lengths, \( l_1, l_2, l_3 \). Then the information that we have is:
* \( E[X] = 10 \)
* \( E[f(X)] = 100 \text{, where } f(x) = x^2 \)
\( f \) is a strictly convex function and the equality \( E[f(X)] = f(E[X]) \) holds, so by Jensen's equality, \( x \) must be a constant and all three lengths must be equal. So the area of the largest square (and all squares) is \(100 m^2 \).