Composition-based methods to identify horizontal gene transfer

Methods Mol Biol. 2009:532:215-25. doi: 10.1007/978-1-60327-853-9_12.

Abstract

The detection of horizontal gene transfer (HGT) events has become an increasingly important issue in recent years. Here we discuss a simple theoretical analysis based on the in silico artificial addition of known foreign genes from different prokaryotic groups into the genome of Escherichia coli K12 MG1655. Using this dataset as a control, we have tested the efficiency of four methodologies commonly employed to detect HGT, which are based on (a) the codon adaptation index, codon usage, and GC percentage (CAI/GC); (b) the distributional profile (DP) approach with a gene search in the closely related phylogenetic genomes; (c) the Bayesian model (BM); and (d) the first-order Markov model (MM). All methods exhibit limitations as shown here, with BM and MM giving better approximations. The MM has a better detection rate when genes from closely related organisms are evaluated. The application of the MM to detect recently transferred genes in the genomes of E. coli strain K12 MG1655 shows that this organism has undergone a rather significant amount of HGT, several of which have well-defined functions that appear to be involved in the direct interaction of the organisms with their environment.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Composition
  • Bayes Theorem
  • Codon
  • Databases, Nucleic Acid
  • Escherichia coli K12 / genetics
  • Gene Transfer, Horizontal*
  • Genetics, Microbial
  • Genomics / methods*
  • Genomics / statistics & numerical data
  • Markov Chains
  • Models, Genetic
  • Pseudogenes

Substances

  • Codon