The Impact of Purifying and Background Selection on the Inference of Population History: Problems and Prospects

Mol Biol Evol. 2021 Jun 25;38(7):2986-3003. doi: 10.1093/molbev/msab050.

Abstract

Current procedures for inferring population history generally assume complete neutrality-that is, they neglect both direct selection and the effects of selection on linked sites. We here examine how the presence of direct purifying selection and background selection may bias demographic inference by evaluating two commonly-used methods (MSMC and fastsimcoal2), specifically studying how the underlying shape of the distribution of fitness effects and the fraction of directly selected sites interact with demographic parameter estimation. The results show that, even after masking functional genomic regions, background selection may cause the mis-inference of population growth under models of both constant population size and decline. This effect is amplified as the strength of purifying selection and the density of directly selected sites increases, as indicated by the distortion of the site frequency spectrum and levels of nucleotide diversity at linked neutral sites. We also show how simulated changes in background selection effects caused by population size changes can be predicted analytically. We propose a potential method for correcting for the mis-inference of population growth caused by selection. By treating the distribution of fitness effect as a nuisance parameter and averaging across all potential realizations, we demonstrate that even directly selected sites can be used to infer demographic histories with reasonable accuracy.

Keywords: fastsimcoal2; MSMC; approximate Bayesian computation (ABC); background selection; demographic inference; distribution of fitness effects.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bayes Theorem
  • Demography / methods*
  • Genetic Fitness*
  • Genetic Techniques*
  • Genome Size
  • Markov Chains
  • Models, Genetic*
  • Polymorphism, Single Nucleotide
  • Selection, Genetic*