Psi-Phi: exploring the outer limits of bacterial pseudogenes

Genome Res. 2004 Nov;14(11):2273-8. doi: 10.1101/gr.2925604. Epub 2004 Oct 12.

Abstract

Because bacterial chromosomes are tightly packed with genes and were traditionally viewed as being optimized for size and replication speed, it was not surprising that the early annotations of sequenced bacterial genomes reported few, if any, pseudogenes. But because pseudogenes are generally recognized by comparisons with their functional counterparts, as more genome sequences accumulated, many bacterial pathogens were found to harbor large numbers of truncated, inactivated, and degraded genes. Because the mutational events that inactivate genes occur continuously in all genomes, we investigated whether the rarity of pseudogenes in some bacteria was attributable to properties inherent to the organism or to the failure to recognize pseudogenes. By developing a program suite (called Psi-Phi, for Psi-gene Finder) that applies a comparative method to identify pseudogenes (attributable both to misannotation and to nonrecognition), we analyzed the pseudogene inventories in the sequenced members of the Escherichia coli/Shigella clade. This approach recovered hundreds of previously unrecognized pseudogenes and showed that pseudogenes are a regular feature of bacterial genomes, even in those whose original annotations registered no truncated or otherwise inactivated genes. In Shigella flexneri 2a, large proportions of pseudogenes are generated by nonsense mutations and IS element insertions, events that seldom produce the pseudogenes present in the other genomes examined. Almost all (>95%) pseudogenes are restricted to only one of the genomes and are of relatively recent origin, suggesting that these bacteria possess active mechanisms to eliminate nonfunctional genes.

Publication types

  • Letter
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Codon, Nonsense / genetics
  • Computational Biology
  • DNA Transposable Elements / genetics
  • Escherichia coli / genetics*
  • Evolution, Molecular
  • Genome, Bacterial*
  • Pseudogenes / genetics*
  • Sequence Analysis, DNA / methods
  • Shigella flexneri / genetics*
  • Software

Substances

  • Codon, Nonsense
  • DNA Transposable Elements