A Complete Recipe for Stochastic Gradient MCMC

Another NIPS paper, this one by Ma, Chen and Fox. This one is interesting in that it seeks to build a framework for MCMC algorithms which only use random subsamples of the data. This is achieved by considering the general form of continuous Markov processes as an SDE given by

dz = f(z)\textrm{d}t + \sqrt{2D(z)}\textrm{d}W(t)

where f is the joint probability of the actual sampling space you care about (the marginal being the target distribution/posterior) and possible auxiliary variables. Now they suggest parameterizing f by one skew-symmetric and one PSD matrix (both of which can depend on the current state z), and proof in two Theorems that this is all you need to describe all Markov chains that admit the target distribution as the unique stationary distribution. The proofs use the Fokker-Planck and Liouville equations, which as a non-SDE person I’ve never heard of. SG_MCMC.png

They use this to show ways of reconstructing known stochastic sampling algorithms and give a new sampler, which is a stochastic version of Riemannian HMC.
The paper is very honest in stating the limitations of SG sampling, namely that one has to diminish the learning rate to infinitesimal values (i.e. 0) to ensure unbiasedness. Which would defeat the purpose of sampling of course.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s