\( \newcommand{\matr}[1] {\mathbf{#1}} \newcommand{\vertbar} {\rule[-1ex]{0.5pt}{2.5ex}} \newcommand{\horzbar} {\rule[.5ex]{2.5ex}{0.5pt}} \newcommand{\E} {\mathrm{E}} \)
deepdream of
          a sidewalk
Show Question
Math and science::INF ML AI

ELBO via Jensen

Process

We are interested in evaluating \( \operatorname{P}_{X} \) for some value. Below we consider \( \operatorname{P}_X(x_4) \). If we know the dynamics of \( X \) with respect to some other variable, \( Z \), then we can calculate \( \operatorname{P}_X \) by considering the random variable product/pair \( (X,Z) \). To calculate \( \operatorname{P}_X \), we sum over \( Z \). If instead of knowing \( \operatorname{P}_{(X,Z)} \) we know \( \operatorname{P}_{X|Z} \), then instead of just summing over \( Z \) we need to average over \( Z \) (calculate the expectation). If we are interested in \( \log(\operatorname{P}_X) \), then we can use the concave nature of log along with Jensen's inequality to get an inequality with the log function inside the expectation.

The objects to be visualized:

Alternative

We will view \( P_X \) as a derived random variable by applying a function to another random variable; the function used will be a probability distribution. Start with a random variable \( Z \) for a probability space \( (\Omega, \mathcal{F}, \operatorname{P}) \). Introduce another random variable \( X \) for the same probability space. From these two random variables we will create a new bi-variate random variable \( Y := Z \times X \). We will then create another random variable \( Y_t := Y_{X=t} \), which is simply \( Y \) with the first input fixed to \( t \). We then have the expression:

\[ \begin{align*} \operatorname{P}_X(v) &= \operatorname{E}[Y_v] \\ &= \sum_{\omega \in \Omega} Y(v, Z(\omega))\operatorname{P}(\omega) \\ &= \sum_{z \in \operatorname{Range}(Z) } \operatorname{P}_Z(z) \operatorname{P}_{X|Z}(v, z) \end{align*} \]

This relies on \( \Omega \) being a discrete probability space, as otherwise our \( \operatorname{P} \) is not defined for discrete elements of \( \Omega \).


Another source: https://cscherrer.github.io/post/variational-importance-sampling/

TODO turn into a proper card.

Visualization of Jensen's applied to ELBO: