Overdispersed Black-Box Variational Inference

This UAI paper by Ruiz, Titsias and Blei presents important insights for the idea of a black box procedure for VI (which I discussed here). The setup of BBVI is the following: given a target/posterior \pi and a parametric approximation q_\lambda, we want to find

\mathrm{argmin}_\lambda \int \log \left ( \frac{\pi(x)}{q_\lambda(x)} \right )  q_\lambda(x) \mathrm{d}x

which can be achieved for any q_\lambda by estimating the gradient

\nabla_\lambda \int \log \left ( \frac{\pi(x)}{q_\lambda(x)} \right )  q_\lambda(x) \mathrm{d}x

with Monte Carlo Samples and stochastic gradient descent. This works if we can easily sample from q_\lambda  and can compute its derivative wrt \lambda in closed form. In the original paper, the authors suggested the use of the score function as a control variate and a Rao-Blackwellization. Both where described in a way that utterly confused me – until now, because Ruiz, Titsias and Blei manage to describe the concrete application of both control variates and Rao-Blackwellization in a very transparent way. Their own contribution to variance reduction (minus some tricks they applied) is based on the fact that the optimal sampling distribution for estimating \nabla_\lambda \int \log \left ( \frac{\pi(x)}{q_\lambda(x)} \right )  q_\lambda(x) \mathrm{d}x is proportional to \left | \log \left ( \frac{\pi(x)}{q_\lambda(x)} \right ) \right |  q_\lambda(x) rather than exactly q_\lambda(x). They argue that this optimal sampling distribution is considerably heavier tailed than q_\lambda(x). Their reasoning is mainly that the norm of the gradient (which is essentially (\nabla_\lambda q_\lambda) \log \left ( \frac{\pi(x)}{q_\lambda(x)} \right )  = q_\lambda(x)(\nabla_\lambda \log q_\lambda(x)) \log \left ( \frac{\pi(x)}{q_\lambda(x)} \right ) )  vanishes for the modes, making that region irrelevant for gradient estimation. The same should be true for the tails of the distribution I think. Overall very interesting work that I strongly recommend reading, if only to understand the original Blackbox VI proposal.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s