Christian put my nose on this preprint by Tak, Meng and van Dyk a few days ago, so during the flight from Berlin to Paris today I sniffed on it. It was a very interesting read and good to get my thoughts away from the fact that I was an Opfer of German friendliness at airport security.
The basic idea is to design a random walk MH algorithm for targets with multiple and distant modes. The distant is important here, because for very close modes simple Adaptive Metropolis (Haario et al. 2001), which is simply using a scaled version of the targets covariance in a Gaussian random walk, should work pretty well.
Which it probably does not for the targets above, as such a proposal would often end up in areas of low posterior density.
Now the idea is the following: given current state of the Markov Chain, the final MH proposal
is constructed using an intermediate step
, where the intermediate step is encouraged to have lower density than
under the target, while
is encouraged to have higher density than the intermediate step. Hoping that the intermediate step gets you out of a mode and the consecutive step into a new mode. This is achieved by using the inverse MH-ratio for encouraging moves to a valley in the target, while the standard MH-ratio is used for moving uphill again. Of course computing the final acceptance probability
involves computing an intractable integral, as one has to integrate away the intermediate downhill move. They get around this by introducing an auxiliary variable, a technique inspired by Møller et al. (2006), where it was used to get around the computation of normalizing constants. I discussed a similar idea shortly with Jeff Miller (currently at Duke) one and a half years ago when we worked on the estimation of normalizing constants though using gradient information.
Am main argument for their algorithm is that it only needs tuning of a single scale parameter, whereas other MCMC techniques for multimodal targets are notoriously hard to tune.
In evaluation they compare to tempered transitions and equi-energy samplers and perform better both with respect to computing time and MSE. And of course human time invested in tuning.
However, I’m slightly tempted to repeat Nicolas mantra here “just use SMC”. Or slightly change the mantra to “… PMC” – because this is easier to stomach for MCMC-biased researchers, or so I believe. Of course this adds the number of particles to tuning, but that might be tuned automatically using the recent paper of the Spanish IS-connection.