Model selection for dynamical systems via sparse regression and information criteria

January 6, 2017

Abstract: 

We develop an algorithm for model selection which allows for the consideration of a combinatorially large number of candidate models governing a dynamical system. The innovation circumvents a disadvantage of standard model selection which typically limits the number candidate models considered due to the intractability of computing information criteria. Using a recently developed sparse identification of nonlinear dynamics algorithm, the sub-selection of candidate models near the Pareto frontier allows for a tractable computation of AIC (Akaike information criteria) or BIC(Bayes information criteria) scores for the remaining candidate models. The information criteria hierarchically ranks the most informative models, enabling the automatic and principled selection of the model with the strongest support in relation to the time series data. Specifically, we show that AIC scores place each candidate model in the strong support, weak support or no support category. The method correctly identifies several canonical dynamical systems, including an SEIR (susceptibleexposed-infectious-recovered) disease model and the Lorenz equations, giving the correct dynamical system as the only candidate model with strong support

Introduction

Model selection is a well established statistical framework for selecting a model from a set of candidate models given time series data, with information theory providing a rigorous criteria for such a selection process. As early as the 1950s, a measure of information loss between empirically collected data and model generated data was proposed to be computed using the Kullback-Leibler (KL) divergence [5, 6]. Akaike built upon this notion to establish a relative estimate of information loss across models that balances model complexity, and goodness-of-fit. This allowed for a principled model selection criteria through the Akaike information criteria (AIC). The AIC was later modified by G. Schwarz to define the more commonly usedBayes information criteria (BIC) [9]. Both AIC and BIC compute the maximum log likelihood of the model and impose a penalty: AIC adds the number of free parameters k of the posited model, while BIC adds half of k multiplied by the log of the number of data points. Much of the popularity of BIC stems from the fact that it can be rigorously proved to be a consistentmetric. Thus if a number of models q are proposed, with one of them being the true model, then as m ! 1 the true model is selected as the correct model with probability approaching unity. Regardless of the selection criterion, AIC or BIC, they both provide a relative estimate of information loss across a selection of m models, balancing model complexity and goodness-of-fit.

Fig. 1.1

Schematic of model selection process, with a) data generation, b) generation of a set of potential models, and c) comparison fo the models as a function of the number of terms in the model and relative Akaike information criteria (AICc). Section c) shows how models are down-selected from a combinatorially large model space using sparse identiļ¬cation of nonlinear dynamics (SINDy) and then further sub-selected and ranked using information criteria.