The Data, Dynamics, and Analytics (DDA) team is focused on applying modern data science, machine-learning, and statistical techniques to a wide-range of data including infectious disease data, household survey data, and genomic surveillance data to help inform programmatic decision making. We aim to leverage the growing success of modern analytic and numerical methods as well as develop new mathematical methodologies tailored to the types of data being collected in resource constrained settings. More specifically, the team focuses on projects that lie at the intersection of machine-learning, nonlinear dynamical systems, and data science. Moreover, DDA is a cross-cutting team that works on projects that have multiple touchpoints across different disease specific teams both within IDM as well as with external collaborators.

Featured Research

Genomic and molecular surveillance data for programmatic action

Genomic and molecular surveillance data has the potential to add new information above routine data collection systems, enabling better programmatic decision-making. Substantial progress has been made incorporating genomics into surveillance for drug resistance in disease surveillance systems, allowing for better case management protocols. DDA is focused on utilizing models to better understand a number of other important use-cases for genomic data, i.e., identifying levels of transmission across different subregions of interest, identifying imported vs local chains of transmission, and mechanistically linking the signals from different data collection sources such as routine surveillance and genomic data.

Analysis and characterization of nonlinear dynamical systems from time-series data

The development of new mathematical techniques to analyze timeseries data collected from noisy, nonlinear systems such as infectious disease surveillance systems may enable better model formulation and near-term forecasts. These models are fundamentally data-driven, where the algorithm learns the correct nonlinear system without heuristically defining a model upfront. DDA is also interested in the development of modal decomposition methods for high-dimensional time-series data. These innovative theoretical developments may help drive future data analysis efforts on a wide-variety of infectious disease datasets.

Machine-learning and geospatial data

Modern machine-learning methods such as convolutional neural nets (CNN) have demonstrated incredible success in a wide-variety of academic and industrial applications. In Global Health, these methods have the potential to identify important features from remote sensing data (such as satellite imagery) that could help predict social and health indicators in areas without direct indicator measurements. DDA is building on the existing scientific literature and exploring the utility of CNNs and daytime imagery to help fill-in-the-gaps within maps that are being used at the programmatic level across a variety of disease and health related applications.

Small area estimation and family planning

Measurement data in low-and-middle-income-countries often comes in the form of complex household surveys. These surveys are typically designed at the spatial scale of the state or country region. Programmatic interventions or decisions are often much more local than the available data. DDA is focused on bridging these two spatial scales by developing small area estimation (SAE) models that integrates complex survey data, leverages nearby survey data, and aggregates multiple survey designs. These SAE models can produce fine-spatial-scale estimates of indicators, such as modern contraceptive prevalence rates or unmet need for family planning, while also providing uncertainty intervals to those estimates.