An efficient algorithm for generating the internal branches of a Kingman coalescent

Theor Popul Biol. 2018 Jul:122:57-66. doi: 10.1016/j.tpb.2017.05.002. Epub 2017 Jul 11.

Abstract

Coalescent simulations are a widely used approach for simulating sample genealogies, but can become computationally burdensome in large samples. Methods exist to analytically calculate a sample's expected frequency spectrum without simulating full genealogies. However, statistics that rely on the distribution of the length of internal coalescent branches, such as the probability that two mutations of equal size arose on the same genealogical branch, have previously required full coalescent simulations to estimate. Here, we present a sampling method capable of efficiently generating limited portions of sample genealogies using a series of analytic equations that give probabilities for the number, start, and end of internal branches conditional on the number of final samples they subtend. These equations are independent of the coalescent waiting times and need only be calculated a single time, lending themselves to efficient computation. We compare our method with full coalescent simulations to show the resulting distribution of branch lengths and summary statistics are equivalent, but that for many conditions our method is at least 10 times faster.

Keywords: Coalescent; Coalescent simulations; Genealogical topology.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Genealogy and Heraldry
  • Genetics, Population*
  • Humans
  • Models, Genetic*
  • Mutation
  • Pedigree
  • Probability