deepdream of
          a sidewalk

Motivating ELBO From Importance Sampling

This is a tl;dr post of a longer (and not yet existing) post on variational auto-encoders.

Derivation idea

The evidence lower bound (ELBO) expression appears naturally when you try to sample the posterior distribution with an approximate distribution. I think this way of arriving at the evidence lower bound is intuitive and reveals more about why concessions are being made.

Importance sampling allows us to calculate the expectation:

EzPz[f(z)]

by instead calculating:

EzQz[f(z)Pz(z)Qz(z)]

We use this idea for variational inference. In order to calculate:

EzPz|xi[Px|z(xi,z)]

we instead calculate:

EzQz|xi[Px|z(xi,z)Pz|x(z,xi)Qz|x(z,xi)]

For reasons to do with using maximum-likelihood as our optimization objective, we are actually interested in:

(1)log(EzQz|xi[Px|z(xi,z)Pz|x(z,xi)Qz|x(z,xi)])

This is a log-likelihood term for just the single data point xi. There is a log-likelihood term for every data point. We can't wait for sampling to close in on the expectation inside the log, instead we want a snappier online calculation. So, we take an accuracy hit and instead calculate:

(2)EzQz|xi[log(Px|z(xi,z)Pz|x(z,xi)Qz|x(z,xi))]

So we are sampling to approximate the log, rather than taking the log of a completed approximation. This frees us to use gradients of a single sample. Jensen's inequality assures us that this new term (2) is less than (1), so we will use it as a proxy to optimize (1). We can rewrite this expression and arrive at:

EzQz|xi[log(Px|z(xi,z))+log(Pz|x(z,xi))log(Qz|x(z,xi))]

And as the expectation of a sum is the sum of expectations, we get:

EzQz|xi[log(Px|z(xi,z))]+DKL(Qz|x||Pz|x)

Which are the ELBO and Kullback-Leibler divergence terms.

If we happened to choose a Qz|xi distribution right on the mark and it equals Pz|xi, then we would be calculating:

EzPz|xi[log(Px|z(xi,z))]

This isn't quite what we were after, which was:

log(EzPz|xi[Px|z(xi,z)])

But Jensen looks down fondly at us and tells us we did alright.