Investigating protein-coding sequence evolution with probabilistic codon substitution models

Maria Anisimova; Carolin Kosiol

doi:10.1093/molbev/msn232

Investigating protein-coding sequence evolution with probabilistic codon substitution models

Mol Biol Evol. 2009 Feb;26(2):255-71. doi: 10.1093/molbev/msn232. Epub 2008 Oct 14.

Authors

Maria Anisimova¹, Carolin Kosiol

Affiliation

¹ Institute of Computational Science, Swiss Federal Institute of Technology, Zurich, Switzerland. maria.anisimova@inf.ethz.ch

PMID: 18922761
DOI: 10.1093/molbev/msn232

Abstract

This review is motivated by the true explosion in the number of recent studies both developing and ameliorating probabilistic models of codon evolution. Traditionally parametric, the first codon models focused on estimating the effects of selective pressure on the protein via an explicit parameter in the maximum likelihood framework. Likelihood ratio tests of nested codon models armed the biologists with powerful tools, which provided unambiguous evidence for positive selection in real data. This, in turn, triggered a new wave of methodological developments. The new generation of models views the codon evolution process in a more sophisticated way, relaxing several mathematical assumptions. These models make a greater use of physicochemical amino acid properties, genetic code machinery, and the large amounts of data from the public domain. The overview of the most recent advances on modeling codon evolution is presented here, and a wide range of their applications to real data is discussed. On the downside, availability of a large variety of models, each accounting for various biological factors, increases the margin for misinterpretation; the biological meaning of certain parameters may vary among models, and model selection procedures also deserve greater attention. Solid understanding of the modeling assumptions and their applicability is essential for successful statistical data analysis.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Bayes Theorem
Codon*
Genetic Code*
Markov Chains
Models, Genetic*
Selection, Genetic

Substances

Codon