Optimization Monte Carlo

A very short note on Optimization Monte Carlo by Meeds and Welling. Max Welling gave a talk on this at a NIPS workshop, during which unfortunately I wasn’t very attentive. Unfortunately because they propose essentially the same thing as David Duvenaud and coauthors did in Early stopping is nonparametric variational inference, which I was super-vocal in criticizing just 2h earlier at Davids poster. Now because I didn’t closely listen at Max’ talk and because it was Max Welling rather than David, I wasn’t all so vocal, but took the discussion with Max offline.

(The following critique is not actually valid for this paper, read the edit)

The critique is essentially the same: if you draw an RV and then optimize it, you can’t compute the Jacobian to correct for the transformation, because the transformation is typically not bijective (think gradient ascent: let $y = \lambda \nabla \log f(x)$ . Now what is the inverse step when you only know $y$ but not $x$ and thus can’t know $\nabla \log f(x)$ ? It doesn’t exist in general). Which makes the very basis of the method unsound. Max answered that one can still compute the Jacobian in certain cases, even if the inverse function doesn’t exist. For some reason I wasn’t able to counter that on the spot. Of course, you can always compute the Jacobian of the forward transformation, but you don’t need that Jacobian. The Jacobian that you need is that of the inverse function. See Theorem 12.7 in Jacod & Protter for example.

I’ll stop here, as I asked Xi’an for his opinion and he will post about this as well, presumably with a more detailed discussion. If Ted Meeds and Max Welling answer to my email on this and prove me wrong, I’ll update this post of course.

EDIT 1: Lesson learned. I shouldn’t post with only half understanding the paper (if half is not said too much). They don’t actually suggest optimizing an RV and computing the Jacobian of the transformation. Rather, they propose optimizing another variable, which is not taken to be random (the parameters of a simulator, which is specific to the ABC setting).

EDIT 2: Xi’ans post

Ingmar's research blog

Optimization Monte Carlo

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply