Variational autoencoders

See Autoencoding variational Bayes

Suppose we have set $X$ with (a priori unknown) probability density function $p(x)$. We would like to find a lower-dimensional representation of $X$ in some latent space $Z$ by imposing a joint probability density $p(x, z)$.

Assumptions.

The marginal $p(x)$ is unknown but we can sample from $X$.
We are free to choose the marginal $p(z)$.
We are free to choose the decoder $p(x|z)$.

The relationship between $X$ and $Z$ is encoded in the conditional probability densities $p(x|z)$ (the decoder) and $p(z|x)$ (the encoder). From Bayes' theorem,

$$ p(z|x) = \frac{p(x|z) p(z)}{p(x)} $$

So once we have specified the prior $p(z)$ and the decoder $p(x|z)$, we have no freedom to choose $p(z|x)$. Furthermore, without access to $p(x)$, we have no way to recover $p(x|z)$ analytically. To get around this problem, we introduce another probability density $q(z|x)$, the approximate posterior, which is intended to be a tractable approximation to the intractable $p(z|x)$. We can quantify the difference between the true posterior and approximate posterior with the Kullback-Leiber divergence,

Rearranging this expression, we have

$ \log p(x) = D_{KL}(q(z|x)||p(z|x)) + L $

where $L$ is the variational lower bound,

$$ L = -D_{KL}(q(z|x)||p(z)) + E_{z \sim q(z|x)}(\log p(x|z)) $$