Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control

This arXival by the dynamical duo Korda & Mesiç deals with optimal control for a dynamical system when no forward differential equations are known. Instead, we are given observation/sensor data x_t capturing the systems state (i.e. the observed state in HMM lingo or observables in dynamical systems lingo),  the control input u_t and the observed state at the following time index x_{t+1}.

The basic idea is to build a Koopman operator that captures the dynamics of the observed data. In the simplest case, this operator is just a linear map, i.e. matrices A, B satisfying [x_{t+1}, u_{t+1}] = [A,B] [x_{t}^\top, u_t^\top]^\top, where A encodes the dependence of the next state upon the current state, B upon the control. Already, this formula reveals a slight inelegance – to stay in Koopman operator world, the control input at t+1 is predicted. This does not make much sense, but so be it.

Now the trick to approximate nonlinear dynamics is well known to statisticians and ML people: rather finding a mapping between x_t and x_{t+1}, we find a mapping between \phi(x_{t}) and \phi(x_{t+1}), where \phi is some nonlinear map to \mathbb{R}^{d_\mathrm{lift}} and called a lifting function in the paper. Now to fit a model to the nonlinear system dynamics, we only need to fit the linear maps A, B to the training data. The loss in the paper is based on a Frobenius or squared euclidean norm and of the form

\min_{A,B} | [\phi(x_{2}), \dots, \phi(x_{T})] - A [\phi(x_{1}), \dots, \phi(x_{T-1})] - B[u_{1}, \dots, u_{T-1}] \|

For the Frobenius norm, the analytical solution is

[A, B] = [\phi(x_{2}), \dots, \phi(x_{T})] [\phi(x_{1}), \dots, \phi(x_{T-1}),u_{1}, \dots, u_{T-1}]^{-1}

where the inverse is the pseudoinverse of course. Up to here, the paper was all about approximating a forward model from data in the case where no a priori closed form DE model is given.
Now the idea for using this for control is to define a convex cost function J that includes both the cost of the system deviating from the target state as well as the cost of control inputs. As the cost is convex, a global optimum can be attained.

The theoretical analysis mainly shows that the approximated Koopman operator converges to the actual infinite dimensional Koopman operator.

While the extension to stochastic systems is only mentioned, I think this is an interesting paper following an approach that seems particularly promising for whats called reinforcement learning in the deep/machine learning community. It seems that many reinforcement learning papers do not use a proper optimisation procedure for control input (it’s just random search). Fitting a model of the systems dynamics and its reaction to control input instead, enabling the usage of well-known gradient based optimisation methods, as the current paper, seems like a good idea to me.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s