A practical algorithm for estimation of the maximum likelihood ancestral reconstruction error

Pac Symp Biocomput. 2010:31-42. doi: 10.1142/9789814295291_0005.

Abstract

The ancestral sequence reconstruction problem asks to predict the DNA or protein sequence of an ancestral species, given the sequences of extant species. Such reconstructions are fundamental to comparative genomics, as they provide information about extant genomes and the process of evolution that gave rise to them. Arguably the best method for ancestral reconstruction is maximum likelihood estimation. Many effective algorithms for accurately computing the most likely ancestral sequence have been proposed. We consider the less-studied problem of computing the expected reconstruction error of a maximum likelihood reconstruction, given the phylogenetic tree and model of evolution, but not the extant sequences. This situation can arise, for example, when deciding which genomes to sequence for a reconstruction project given a gene-tree phylogeny (The Taxon Selection Problem). In most applications, the reconstruction error is necessarily very small, making Monte Carlo simulations very inefficient for accurate estimation. We present the first practical algorithm for this problem and demonstrate how it can be used to quickly and accurately estimate the reconstruction accuracy. We then use our method as a kernel in a heuristic algorithm for the taxon selection problem. The implementation is available at http://www.mcb.mcgill.ca/ blanchem/mlerror.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Computational Biology
  • Computer Simulation
  • Evolution, Molecular*
  • Humans
  • Likelihood Functions*
  • Mammals / classification
  • Mammals / genetics
  • Models, Genetic
  • Monte Carlo Method
  • Mutation
  • Phylogeny