A Poisson model of sequence comparison and its application to coronavirus phylogeny

Math Biosci. 2009 Feb;217(2):159-66. doi: 10.1016/j.mbs.2008.11.006. Epub 2008 Dec 6.

Abstract

In this paper, we propose two metrics to compare DNA and protein sequences based on a Poisson model of word occurrences. Instead of comparing the frequencies of all fixed-length words in two sequences, we consider (1) the probability of 'generating' one sequence under the Poisson model estimated from the other; (2) their different expression levels of words. Phylogenetic trees of 25 viruses including SARS-CoVs are constructed to illustrate our approach.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Coronaviridae / genetics*
  • DNA, Mitochondrial / genetics
  • DNA, Viral / genetics
  • Humans
  • Models, Genetic*
  • Phylogeny
  • Poisson Distribution*

Substances

  • DNA, Mitochondrial
  • DNA, Viral