Core Concepts
There are statistical features that distinguish outbreak and non-outbreak sequences long before outbreaks occur, which can be used to accurately predict impending disease outbreaks and non-outbreaks.
Abstract
The authors developed a general model that can accurately forecast both disease outbreaks and non-outbreaks without using real-world training data. They used a feature-based time series classification method to predict outbreaks and non-outbreaks.
The authors simulated disease transmission dynamics using a Susceptible-Infected-Recovered (SIR) model with three types of noise (white noise, multiplicative environmental noise, and demographic noise). They generated 14,400 replicates of time series data, with half exhibiting a transcritical bifurcation (outbreak) and the other half a null bifurcation (non-outbreak).
The authors extracted 22 statistical features and 5 early warning signal indicators from the simulated time series data and used them to train four machine learning models (gradient boosting machine, logistic regression, k-nearest neighbor, and support vector machine) to classify outbreak and non-outbreak sequences.
The classifiers achieved near-perfect performance on withheld synthetic testing sets, with area under the receiver-operating curve (AUC) scores ranging from 0.99 to 1. Further experiments showed that the classifiers could handle time series of varying lengths and those far from the transition point (outbreak timing).
The authors also tested the classifiers on real-world COVID-19 data from Singapore and SARS data from Hong Kong. Two classifiers, trained using 5 early warning signal indicators and the logistic regression model, achieved an accuracy of 1 on the out-of-sample empirical data.
The results suggest that there are statistical features that can distinguish outbreak and non-outbreak sequences long before outbreaks occur, which can be used to accurately predict impending disease outbreaks and non-outbreaks.
Stats
The authors used the following key metrics and figures to support their analysis:
The basic reproduction number R0, a dimensionless value representing the expected number of secondary infections caused by a single infectious individual in a completely susceptible population.
The final size, representing the proportion of individuals who ultimately become infected over the disease transmission process.
Quotes
"Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management."
"Early preventative intervention is associated with lower incidence."
"Anticipating infectious disease outbreaks and non-outbreaks, ideally in an early and accurate manner, therefore becomes imperative to prevent misjudgments of health risk perceptions faced by societies and their citizens, guiding the implementation of mitigation measures."