deepdream of
          a sidewalk
Show Question
Math and science::INF ML AI

Jensen's inequality

If f is a convex (smile) function and X is a random variable then:

E[f(X)]f(E[X])

If f is strictly convex and E[f(X)]=f(E[X]), then the random variable X is a constant.


Intuition

A few ways of visualizing Jensen's inequality.

Sense of stretching

Interpolation across E[X]

Regardless of the weightings (probabilities) of x1 and x2, their expectation E[x] will lie somewhere between x1 and x2. Mapping x1 and x2 through f and calculating the expectation gives us E[f(X)], which will like somewhere on the line between f(x1) and f(x2). But if we were first to calculate E[X] and then pass this through f to get f(E[X]), then this value would be less-equal due to the increasing (convex) nature of f.

A similar visualization with more points (from Mark Reid's blog post):

A similar take (from Andrew Ng's notes for CS229):

Example

Q1: Three squares

Three squares have average area A¯=100m2. The average of the lengths of their sides is l¯=10m. What can be said about the size of the largest of the three squares?

A1:
Let x be the length of the side of a square, and let the probability of x be 13,13,13 over the three lengths, l1,l2,l3. Then the information that we have is:

* E[X]=10
* E[f(X)]=100, where f(x)=x2

f is a strictly convex function and the equality E[f(X)]=f(E[X]) holds, so by Jensen's equality, x must be a constant and all three lengths must be equal. So the area of the largest square (and all squares) is 100m2.

More visualization: