The preprint by Rémi Bardenet, Arnaud Doucet and Chris Homes offers a very well written overview and some original contributions regarding MCMC with large datasets (I like ‘tall’ – maybe high dimensional posteriors could then be ‘fat’ and problems which exhibit both are the worn-out ‘big’ ?-).
Their original contributions revolve around using ideas from Firefly MC (developed by the authors of the Stochastic Optimization is Variational inference paper) on bounds for the likelihood with their Confidence sampler. I must say I wasn’t able to follow in detail and will have to read up on Firefly and the original Confidence sampler paper. One downside I see is that targets need to be differentiable (thrice for their bounds using Taylor approximations) and thus in particular continuous. While in general I try to avoid discrete models simply because you cant use gradient information, sometimes they are necessary. And better make progress on something slightly limited than not make progress.
In their review section, they clearly advise against divide-and-conquer approaches for tall data, i.e. approaches that sample from posteriors involving only part of the data and than combining the information somehow. The main reason for this being that the currently existing upper bounds suggest that the MSE of divide-and-conquer approaches could grow exponential in the number of batches. Thus one would have to use few, large batches, defeating the whole purpose of special methods for tall data.
The section finally exposed me to the delayed acceptance approach of Marco Banterle et al. which I meant to read for quite some time (and will read in the original). I found this approach to be very appealing for its elementary formulation. Most other approaches, including Bardenet, Doucet and Homes, are quite involved. I have to look into the objection regarding ergodicity of delayed acceptance raised by Bardenet, which could potentially be quite a problem of course.
A word about the evaluation section: I think its unfortunate that two of four experiments use logistic regression. The logistic regression target is very close to Gaussian, where MCMC is usually slammed by a simple EP approach (Leave Pima Indians alone!). And EP is easy to apply in the big data setting, or thats what I’ve heard. About the gamma regression target I don’t know, might be that EP does not do so well in that case.