Category Archives: Importance Sampling

Talks in England

Next week I’ll be touring England, more specifically the Universities of Reading, Warwick, Oxford and UCL. Apart from visiting dear friends, I’ll be giving a poster in Warwick (still not ready!) and talks at all other places.

19.4.2015 2pm Gradient IS and Unadjusted Langevin for IS Univ. of Reading Afternoon on Bayesian Computation, Nike Lecture Theatre, Agriculture Building
22.4.2016 1pm Kernel Sequential Monte Carlo UCL Roberts G08 Sir David Davies LT, Announcement
25.4.2016 3:30pm Kernel Sequential Monte Carlo  Univ. of Oxford Department of Statistics, St Giles building, Seminar Room 1, LG.03

The poster will be on Flyweight evidence estimates at the CRiSM workshop on Estimating Constants on the 20th. Slides and the poster will go up on the Talks page.

Importance Sampling Session at MCQMC 2016

On behalf of the mcqmc 2016 organizing committee I am pleased to accept your proposal.
-Art Owen
I got this nice message from Art yesterday night. My proposal for a session on Advances in Importance Sampling at MCQMC 2016 got accepted. Which is great, as I think the session is made up of strong papers (obviously). This session will almost surely be moderated by Nicolas Chopin.

MCQMC session on Advances in Importance Sampling

The sample size required in Importance Sampling

S. Chatterjee, P. Diaconis

The goal of importance sampling is to estimate the expected value of a given function with respect to a probability measure ν using a random sample of size n drawn from a different probability measure μ. If the two measures μ and ν are nearly singular with respect to each other, which is often the case in practice, the sample size required for accurate estimation is large. In this article it is shown that in a fairly general setting, a sample of size approximately exp(D(ν||μ)) is necessary and sufficient for accurate estimation by importance sampling, where D(ν||μ) is the Kullback–Leibler divergence of μ from ν. In particular, the required sample size exhibits a kind of cut-off in the logarithmic scale. The theory is applied to obtain a fairly general formula for the sample size required in importance sampling for exponential families (Gibbs measures). We also show that the standard variance-based diagnostic for convergence of importance sampling is fundamentally problematic. An alternative diagnostic that provably works in certain situations is suggested.

Generalized Multiple Importance Sampling

V. Elvira, L. Martino, D. Luengo, M. Bugallo

Importance Sampling methods are broadly used to approximate posterior distributions or some of their moments. In its standard approach, samples are drawn from a single proposal distribution and weighted properly. However, since the performance depends on the mismatch between the targeted and the proposal distributions, several proposal densities are often employed for the generation of samples. Under this Multiple Importance Sampling (MIS) scenario, many works have addressed the selection or adaptation of the proposal distributions, interpreting the sampling and the weighting steps in different ways. In this paper, we establish a general framework for sampling and weighing procedures when more than one proposal are available. The most relevant MIS schemes in the literature are encompassed within the new framework, and, moreover novel valid schemes appear naturally. All the MIS schemes are compared and ranked in terms of the variance of the associated estimators. Finally, we provide illustrative examples which reveal that, even with a good choice of the proposal densities, a careful interpretation of the sampling and weighting procedures can make a significant difference in the performance of the method.

Continuous-Time Importance Sampling

K. Łatuszyński, G. Roberts, G. Sermaidis, P. Fearnhead

We will introduce a new framework for sequential Monte Carlo, based on evolving a set of weighted particles in continuous time. This framework can lead to novel versions of existing algorithms, such as Annealed Importance Sampling and the Exact Algorithm for diffusions, and can be used as an alternative to MALA for sampling from a target distribution of interest. These methods are amenable to the use of sub-sampling, which can greatly increase their computational efficiency for big-data applications; and can enable unbiased sampling from a much wider-range of target distributions than existing approaches.

Kernel Sequential Monte Carlo

Heiko Strathmann, Brooks Paige, Dino Sejdinovic and I updated our draft on Kernel Sequential Monte Carlo (on the arxiv). Apart from locally adaptive covariance matrices for Gaussian proposals in various SMC algorithms, we also look at gradient emulators – both for targets that do not admit a first (gradient emulators) or even second derivative (locally adaptive covariance).
The emulators can be used in different ways, either as proposals for a MCMC rejuvenation step in SMC or as importance densities directly – for example in Population Monte Carlo.
We found especially the gradient emulator to be rather sensitive to the variance of the fit. Not Rao-Blackwellizing across importance densities used in a PMC iteration leads to gigantic estimated gradients and an exploding algorithm, while using a weighted streaming fit of the emulator with Rao-Blackwellization works just fine.
Plus we evaluate on the Stochastic volatility target from Nicolas SMC^2 paper, which is a much more nicer benchmark that what we had in the last draft (the plot being the targets marginals). Any feedback please send my way.

A comparison of generalized hybrid Monte Carlo methods with and without momentum flip

This experimental paper from 2009 by Akhmatskaya, Bou-Rabee and Reich in the Journal of Computational Physics takes a look at what in statistics people would call HMC with partial momentum refreshment.The paper got my attention mainly because I met Elena Akhmatskaya at MCMSki.

The ‘generalized’ in the paper refers to the fact that the momenta are not resampled at each new HMC step, but rather according toCapture d’écran 2016-02-08 à 18.43.30where p_i \sim \mathcal{N}(0,\Sigma) is the old momentum, and we mix it with another Gaussian RV of the same distribution u_i for some parameter \phi. We recover HMC for \phi=\pi/2, as in that case the momentum is always refreshed.

Also, they are looking at a so-called shadow HMC variant, which really samples from a Hamiltonian system that is not the original one, but close to it. Correction for sampling from the wrong distribution is done by importance weighting (when she was telling me that I thought “I knew IS could work in high dimensions!”, as they sample from up to 10,000 dimensions and do better than HMC!).

Now the main gist of the paper is that they where trying to get rid of momentum flips in the case of rejection of the proposal, as it was previously thought that momentum flips lead to a worse sampler. The intuition would be that a momentum flip leads you back to where you started. However, the finding is that to the contrary, their version without flip (that still satisfies a detailed balance criterion) does worse in actual estimation tasks than the conservative Generalized Shadow HMC (GSHMC) sampler. I like when things stay easy. But as this paper does not describe the actual GSHMC method in detail, I will have to go back to the original paper, as my interest is sparked.

Streaming weighted updates of cholesky factors

… are hell. Heiko and I have been spending the day trying to get code to work for a numerically stable streaming update of a Cholesky factor in an adaptive Importance Sampling setting. This is less then trivial, also because it has to be done in a weighted fashion (because of Importance Sampling) and weights should be kept in logspace for stability. Thinking I will have to go through this again for an Eigendecompositon instad of Cholesky for a collaboration planned with Mathieu Gerber does not make things easier to stomach.
This day again strengthens my wish to move more in the direction of Judith-Russeau-statistics, i.e. away from computers, towards pen and paper.

PS: Does anybody have code solving exactly this problem (we have tons for almost this problem)?