On the surprising effectiveness of a simple matrix exponential derivative approximation, with application to global SARS-CoV-2

Proc Natl Acad Sci U S A. 2024 Jan 16;121(3):e2318989121. doi: 10.1073/pnas.2318989121. Epub 2024 Jan 12.

Abstract

The continuous-time Markov chain (CTMC) is the mathematical workhorse of evolutionary biology. Learning CTMC model parameters using modern, gradient-based methods requires the derivative of the matrix exponential evaluated at the CTMC's infinitesimal generator (rate) matrix. Motivated by the derivative's extreme computational complexity as a function of state space cardinality, recent work demonstrates the surprising effectiveness of a naive, first-order approximation for a host of problems in computational biology. In response to this empirical success, we obtain rigorous deterministic and probabilistic bounds for the error accrued by the naive approximation and establish a "blessing of dimensionality" result that is universal for a large class of rate matrices with random entries. Finally, we apply the first-order approximation within surrogate-trajectory Hamiltonian Monte Carlo for the analysis of the early spread of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) across 44 geographic regions that comprise a state space of unprecedented dimensionality for unstructured (flexible) CTMC models within evolutionary biology.

Keywords: Hamiltonian Monte Carlo; continuous-time Markov chains; matrix exponential; molecular epidemiology; random matrix theory.

MeSH terms

  • Algorithms
  • COVID-19* / epidemiology
  • Humans
  • Markov Chains
  • SARS-CoV-2*