Math and science::INF ML AI

Jensen's inequality

If $f$ is a convex (smile) function and $X$ is a random variable then:

E [f (X)] \geq f (E [X])

If $f$ is strictly convex and $E [f (X)] = f (E [X])$ , then the random variable $X$ is a constant.

Intuition

A few ways of visualizing Jensen's inequality.

Sense of stretching

Interpolation across $E [X]$

Regardless of the weightings (probabilities) of $x_{1}$ and $x_{2}$ , their expectation $E [x]$ will lie somewhere between $x_{1}$ and $x_{2}$ . Mapping $x_{1}$ and $x_{2}$ through $f$ and calculating the expectation gives us $E [f (X)]$ , which will like somewhere on the line between $f (x_{1})$ and $f (x_{2})$ . But if we were first to calculate $E [X]$ and then pass this through $f$ to get $f (E [X])$ , then this value would be less-equal due to the increasing (convex) nature of $f$ .

A similar visualization with more points (from Mark Reid's blog post):

A similar take (from Andrew Ng's notes for CS229):

Example

Q1: Three squares

Three squares have average area

\bar{A} = 100 m^{2}

. The average of the lengths of their sides is

\bar{l} = 10 m

. What can be said about the size of the largest of the three squares?

A1:

Let

x

be the length of the side of a square, and let the probability of

x

\frac{1}{3}, \frac{1}{3}, \frac{1}{3}

over the three lengths,

l_{1}, l_{2}, l_{3}

. Then the information that we have is:

E [X] = 10

E [f (X)] = 100, where f (x) = x^{2}

f

is a strictly convex function and the equality

E [f (X)] = f (E [X])

holds, so by Jensen's equality,

x

must be a constant and all three lengths must be equal. So the area of the largest square (and all squares) is

100 m^{2}

More visualization:

Source

https://mark.reid.name/blog/behold-jensens-inequality.html