Human judgement forecasting of COVID-19 in the UK

Nikos I Bosse; Sam Abbott; Johannes Bracher; Edwin van Leeuwen; Anne Cori; Sebastian Funk

doi:10.12688/wellcomeopenres.19380.2

Human judgement forecasting of COVID-19 in the UK

Wellcome Open Res. 2024 Mar 21:8:416. doi: 10.12688/wellcomeopenres.19380.2. eCollection 2023.

Authors

Nikos I Bosse^{1

2}, Sam Abbott¹, Johannes Bracher^{3

4}, Edwin van Leeuwen^{2

5}, Anne Cori⁶, Sebastian Funk^{1

2}

Affiliations

¹ Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK.
² NIHR Health Protection Research Unit in Modelling & Health Economics, London, UK.
³ Computational Statistics Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
⁴ Chair of Statistical Methods and Econometrics, Karlsruhe Institute of Technology, Karlsruhe, Germany.
⁵ Modelling Economics Unit, UK Health Security Agency, London, UK.
⁶ MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, England, UK.

Abstract

Background: In the past, two studies found ensembles of human judgement forecasts of COVID-19 to show predictive performance comparable to ensembles of computational models, at least when predicting case incidences. We present a follow-up to a study conducted in Germany and Poland and investigate a novel joint approach to combine human judgement and epidemiological modelling.

Methods: From May 24th to August 16th 2021, we elicited weekly one to four week ahead forecasts of cases and deaths from COVID-19 in the UK from a crowd of human forecasters. A median ensemble of all forecasts was submitted to the European Forecast Hub. Participants could use two distinct interfaces: in one, forecasters submitted a predictive distribution directly, in the other forecasters instead submitted a forecast of the effective reproduction number R _t. This was then used to forecast cases and deaths using simulation methods from the EpiNow2 R package. Forecasts were scored using the weighted interval score on the original forecasts, as well as after applying the natural logarithm to both forecasts and observations.

Results: The ensemble of human forecasters overall performed comparably to the official European Forecast Hub ensemble on both cases and deaths, although results were sensitive to changes in details of the evaluation. R _t forecasts performed comparably to direct forecasts on cases, but worse on deaths. Self-identified "experts" tended to be better calibrated than "non-experts" for cases, but not for deaths.

Conclusions: Human judgement forecasts and computational models can produce forecasts of similar quality for infectious disease such as COVID-19. The results of forecast evaluations can change depending on what metrics are chosen and judgement on what does or doesn't constitute a "good" forecast is dependent on the forecast consumer. Combinations of human and computational forecasts hold potential but present real-world challenges that need to be solved.

Keywords: COVID-19; UK; United Kingdom; Weighted Interval Score; forecasting; human judgement forecasting.

Grants and funding

WT_/Wellcome Trust/United Kingdom