ELBO via Jensen
Process
We are interested in evaluating \( \operatorname{P}_{X} \) for some value. Below we consider \( \operatorname{P}_X(x_4) \). If we know the dynamics of \( X \) with respect to some other variable, \( Z \), then we can calculate \( \operatorname{P}_X \) by considering the random variable product/pair \( (X,Z) \). To calculate \( \operatorname{P}_X \), we sum over \( Z \). If instead of knowing \( \operatorname{P}_{(X,Z)} \) we know \( \operatorname{P}_{X|Z} \), then instead of just summing over \( Z \) we need to average over \( Z \) (calculate the expectation). If we are interested in \( \log(\operatorname{P}_X) \), then we can use the concave nature of log along with Jensen's inequality to get an inequality with the log function inside the expectation.
The objects to be visualized:
Alternative
We will view \( P_X \) as a derived random variable by applying a function to another random variable; the function used will be a probability distribution. Start with a random variable \( Z \) for a probability space \( (\Omega, \mathcal{F}, \operatorname{P}) \). Introduce another random variable \( X \) for the same probability space. From these two random variables we will create a new bi-variate random variable \( Y := Z \times X \). We will then create another random variable \( Y_t := Y_{X=t} \), which is simply \( Y \) with the first input fixed to \( t \). We then have the expression:
This relies on \( \Omega \) being a discrete probability space, as otherwise our \( \operatorname{P} \) is not defined for discrete elements of \( \Omega \).
Another source: https://cscherrer.github.io/post/variational-importance-sampling/
TODO turn into a proper card.