Generative Bayesian modeling to nowcast the effective reproduction number from line list data with missing symptom onset dates

PLoS Comput Biol. 2024 Apr 16;20(4):e1012021. doi: 10.1371/journal.pcbi.1012021. eCollection 2024 Apr.

Abstract

The time-varying effective reproduction number Rt is a widely used indicator of transmission dynamics during infectious disease outbreaks. Timely estimates of Rt can be obtained from reported cases counted by their date of symptom onset, which is generally closer to the time of infection than the date of report. Case counts by date of symptom onset are typically obtained from line list data, however these data can have missing information and are subject to right truncation. Previous methods have addressed these problems independently by first imputing missing onset dates, then adjusting truncated case counts, and finally estimating the effective reproduction number. This stepwise approach makes it difficult to propagate uncertainty and can introduce subtle biases during real-time estimation due to the continued impact of assumptions made in previous steps. In this work, we integrate imputation, truncation adjustment, and Rt estimation into a single generative Bayesian model, allowing direct joint inference of case counts and Rt from line list data with missing symptom onset dates. We then use this framework to compare the performance of nowcasting approaches with different stepwise and generative components on synthetic line list data for multiple outbreak scenarios and across different epidemic phases. We find that under reporting delays realistic for hospitalization data (50% of reports delayed by more than a week), intermediate smoothing, as is common practice in stepwise approaches, can bias nowcasts of case counts and Rt, which is avoided in a joint generative approach due to shared regularization of all model components. On incomplete line list data, a fully generative approach enables the quantification of uncertainty due to missing onset dates without the need for an initial multiple imputation step. In a real-world comparison using hospitalization line list data from the COVID-19 pandemic in Switzerland, we observe the same qualitative differences between approaches. The generative modeling components developed in this work have been integrated and further extended in the R package epinowcast, providing a flexible and interpretable tool for real-time surveillance.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Basic Reproduction Number* / statistics & numerical data
  • Bayes Theorem*
  • COVID-19* / epidemiology
  • COVID-19* / transmission
  • Computational Biology / methods
  • Computer Simulation
  • Disease Outbreaks / statistics & numerical data
  • Humans
  • SARS-CoV-2