Inferring Biological Networks by Sparse Identification of Nonlinear Dynamics

January 19, 2017


Inferring the structure and dynamics of network models is critical to understanding the functionality and control of complex systems, such as metabolic and regulatory biological networks. The increasing quality and quantity of experimental data enable statistical approaches based on information theory for model selection and goodness-of-fit metrics. We propose an alternative data-driven method to infer networked nonlinear dynamical systems by using sparsity-promoting optimization to select a subset of nonlinear interactions representing dynamics on a network. In contrast to standard model selection methods-based upon information content for a finite number of heuristic models (order 10 or less), our model selection procedure discovers a parsimonious model from a combinatorially large set of models, without an exhaustive search. Our particular innovation is appropriate for many biological networks, where the governing dynamical systems have rational function nonlinearities with cross terms, thus requiring an implicit formulation and the equations to be identified in the null-space of a library of mixed nonlinearities, including the state and derivative terms. This method, implicit-SINDy, succeeds in inferring three canonical biological models: 1) Michaelis-Menten enzyme kinetics; 2) the regulatory network for competence in bacteria; and 3) the metabolic network for yeast glycolysis.

Figure 1

Methodology for sparse identification of nonlinear dynamics (SINDy) from data. First, data is generated from a dynamical system, in this case a biological network. The time series of data is synthesized into a nonlinear function library, and the terms in this library are related to the time derivative by an overdetermined linear regression problem. Enforcing sparsity ensures that only a small number of coefficients are nonzero, identifying the few active terms in the dynamics that are needed to model the system.