Ergodicity of Combocontinuous Adaptive MCMC algorithms

easy_adaptive_mcmc

This preprint by Jeff Rosenthal and Jinyoung Yang (currently available from Jeffs webpage) might also be called “Easily verifiable adaptive MCMC”. Jeff Rosenthal gave a tutorial on adaptive MCMC during MCMSki 2016 mentioning this work.  Adaptive MCMC is based on the idea that one can use the information gathered from sampling a distribution using MCMC to improve the efficiency of the sampling process.

If two conditions, diminishing adaptation and containment are satisfied, an adaptive MCMC algorithm is valid in the sense of asymptotically consistent. Diminishing adaptation means that two consecutive Markov Kernels in the algorithm will be asymptotically equal. In other words, we either stop adaptation at some point or we know that the adaptation algorithm converges.
Containment means the number of repeated applications of all used Markov Kernels to get close to the target measure is bounded. Concretely, let \gamma be a Markov kernel index, P_\gamma^m(x,\cdot) be the distribution resulting from m-fold application of kernel P_\gamma starting from x . In other words start MCMC at point x with kernel P_\gamma, let it run for m iterations and consider the induced distribution for the last point. Let \pi be the target distribution. Then containment requires that
 \{M_\epsilon(X_n, \Gamma_n)\}_{n=1}^\infty  is bounded in probability for all \epsilon > 0. Here M_\epsilon(x, \gamma) = \inf \{ m \geq 1 : \| P_\gamma^m(x,\cdot) - \pi(\cdot) \|_\textrm{TV}\} and \|\cdot\|_\textrm{TV}\} is a worst case distance between distributions (total variation distance).

The paper is concerned with trying to find conditions for containment in adaptive MCMC that are more easily verified than those from earlier papers. First however it gives a kind of blueprint for adaptive algorithms that satisfy containment.

A blueprint for consistent adaptive MCMC

Nameley, let \mathbb{R}^d be the support of the target distribution and K \subseteq \mathbb{R}^d some large bounded region, D > 0 some large constant. The blueprint, Bounded Adaptive Metropolis, is the following:

Start the algorithm at some X_0 \in K and fix a d \times d covariance matrix \Sigma_*. At iteration n generate a proposal Y_{n+1} by

(1)Y_{n+1} \sim \mathcal{N}(X_n, \Sigma_*)~\textrm{if}~X_n \notin K
(2)Y_{n+1} \sim \mathcal{N}(X_n, \Sigma_{n+1})~\textrm{if}~X_n \in K

Reject if |Y_{n+1} - X_{n}| > D, else accept with the usual Metropolis-Hastings acceptance probability. The \Sigma_{n+1} can be chosen almost arbitrarily if the diminishing adaptation condition is met, so either the mechanism of choosing is fixed asymptotically or converges.

It would seem to me that we can actually change the distribution in (2) arbitrarily if we continue to meet diminishing adaptation. So for example we could use an independent metropolis, adaptive Langevin or other sophisticated proposal inside K, so long as condition (e) in the paper is satisfied, i.e. the adaptive proposal distribution used in (2) is continuous in X_n. Which leads us to the actual conditions for containment.

General conditions for containment in adaptive MCMC

Let \mathcal{X} be a general state space. For example in the Bounded Metropolis we had \mathcal{X}=\mathbb{R}^d. The conditions the authors give are (even more simplified by me):

(a)  The probability to move more than some finite distance D > 0 is zero: Pr(|X_{n+1} - X_n| > D) = 0
(b) Outside of K, the algorithm uses a fixed transition kernel P that never changes (and still respects that we can at most move D far away)
(c) The fixed kernel P is bounded above by P(x, dy) \leq M \mu_*(dy) for finite constant M > 0 and all x that are outside K but no farther from it than D (call that set K_D) and all y that are between D and 2D distance from K (call that set K_{2D} \backslash K_D). Here \mu_* is any distribution concentrated on K_{2D} \backslash K_D.
(d) The fixed kernel P is bounded below by P^{n_0}(x, A) \geq \epsilon \nu_*(A) for some measure \nu_* on \mathcal{X}, some n_0 \in \mathbb{N} and some event A.
(e) Let \gamma be the parameter adapted by the algorithm. The overall proposal densities q_\gamma(x,y) (combining the proposal in and outside of K) are continuous in \gamma for fixed (x,y) and combocontinuous in x. Practically, this would be that the fixed proposal when outside  K and the adaptive proposal when inside K are both continuous.

Here, conditions (a) and (b) are very easy to ensure even when not an expert on MCMC. Conditions (c) and (d) sound harder, but as mentioned above it seems to me that they are easy to ensure by just using a (truncated, i.e. respecting (a)) gaussian random walk proposal outside of K. Finally, (e) seems to boil down to making the adaptive proposal continuous in both \gamma and x.

The proofs use a generalization of piecewise continuous functions and a generalized version of Dinis theorem to prove convergence in total variation distance.

This paper seems to me to be a long way from Roberts & Rosenthal (2007, Journal of Applied Probability) which was the first paper I read on ergodicity conditions for adaptive MCMC. It truly makes checking containment much easier. My one concern is that the exposition could be clearer for people that are not MCMC researchers. Then again, this is a contribution paper rather than a tutorial.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s