On Markov Chain Monte Carlo methods for tall data

The preprint by Rémi Bardenet, Arnaud Doucet and Chris Homes offers a very well written overview and some original contributions regarding MCMC with large datasets (I like ‘tall’ – maybe high dimensional posteriors could then be ‘fat’ and problems which exhibit both are the worn-out ‘big’ ?-).

Their original contributions revolve around using ideas from Firefly MC (developed by the authors of the Stochastic Optimization is Variational inference paper) on bounds for the likelihood with their Confidence sampler. I must say I wasn’t able to follow in detail and will have to read up on Firefly and the original Confidence sampler paper. One downside I see is that targets need to be differentiable (thrice for their bounds using Taylor approximations) and thus in particular continuous. While in general I try to avoid discrete models simply because you cant use gradient information, sometimes they are necessary. And better make progress on something slightly limited than not make progress.

In their review section, they clearly advise against divide-and-conquer approaches for tall data, i.e. approaches that sample from posteriors involving only part of the data and than combining the information somehow. The main reason for this being that the currently existing upper bounds suggest that the MSE of divide-and-conquer approaches could grow exponential in the number of batches. Thus one would have to use few, large batches, defeating the whole purpose of special methods for tall data.

The section finally exposed me to the delayed acceptance approach of Marco Banterle et al. which I meant to read for quite some time (and will read in the original). I found this approach to be very appealing for its elementary formulation. Most other approaches, including Bardenet, Doucet and Homes, are quite involved. I have to look into the objection regarding ergodicity of delayed acceptance raised by Bardenet, which could potentially be quite a problem of course.

A word about the evaluation section: I think its unfortunate that two of four experiments use logistic regression. The logistic regression target is very close to Gaussian, where MCMC is usually slammed by a simple EP approach (Leave Pima Indians alone!). And EP is easy to apply in the big data setting, or thats what I’ve heard. About the gamma regression target I don’t know, might be that EP does not do so well in that case.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s