This arXival by the dynamical duo Korda & Mesiç deals with optimal control for a dynamical system when no forward differential equations are known. Instead, we are given observation/sensor data capturing the systems state (i.e. the observed state in HMM lingo or observables in dynamical systems lingo), the control input and the observed state at the following time index .
The basic idea is to build a Koopman operator that captures the dynamics of the observed data. In the simplest case, this operator is just a linear map, i.e. matrices satisfying , where encodes the dependence of the next state upon the current state, upon the control. Already, this formula reveals a slight inelegance – to stay in Koopman operator world, the control input at is predicted. This does not make much sense, but so be it.
Now the trick to approximate nonlinear dynamics is well known to statisticians and ML people: rather finding a mapping between and , we find a mapping between and , where is some nonlinear map to and called a lifting function in the paper. Now to fit a model to the nonlinear system dynamics, we only need to fit the linear maps to the training data. The loss in the paper is based on a Frobenius or squared euclidean norm and of the form
For the Frobenius norm, the analytical solution is
where the inverse is the pseudoinverse of course. Up to here, the paper was all about approximating a forward model from data in the case where no a priori closed form DE model is given.
Now the idea for using this for control is to define a convex cost function that includes both the cost of the system deviating from the target state as well as the cost of control inputs. As the cost is convex, a global optimum can be attained.
The theoretical analysis mainly shows that the approximated Koopman operator converges to the actual infinite dimensional Koopman operator.
While the extension to stochastic systems is only mentioned, I think this is an interesting paper following an approach that seems particularly promising for whats called reinforcement learning in the deep/machine learning community. It seems that many reinforcement learning papers do not use a proper optimisation procedure for control input (it’s just random search). Fitting a model of the systems dynamics and its reaction to control input instead, enabling the usage of well-known gradient based optimisation methods, as the current paper, seems like a good idea to me.