Recognizing the pseudogenes in bacterial genomes

Nucleic Acids Res. 2005 Jun 2;33(10):3125-32. doi: 10.1093/nar/gki631. Print 2005.

Abstract

Pseudogenes are now known to be a regular feature of bacterial genomes and are found in particularly high numbers within the genomes of recently emerged bacterial pathogens. As most pseudogenes are recognized by sequence alignments, we use newly available genomic sequences to identify the pseudogenes in 11 genomes from 4 bacterial genera, each of which contains at least 1 human pathogen. The numbers of pseudogenes range from 27 in Staphylococcus aureus MW2 to 337 in Yersinia pestis CO92 (e.g. 1-8% of the annotated genes in the genome). Most pseudogenes are formed by small frameshifting indels, but because stop codons are A + T-rich, the two low-G + C Gram-positive taxa (Streptococcus and Staphylococcus) have relatively high fractions of pseudogenes generated by nonsense mutations when compared with more G + C-rich genomes. Over half of the pseudogenes are produced from genes whose original functions were annotated as 'hypothetical' or 'unknown'; however, several broadly distributed genes involved in nucleotide processing, repair or replication have become pseudogenes in one of the sequenced Vibrio vulnificus genomes. Although many of our comparisons involved closely related strains with broadly overlapping gene inventories, each genome contains a largely unique set of pseudogenes, suggesting that pseudogenes are formed and eliminated relatively rapidly from most bacterial genomes.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Genome, Bacterial*
  • Pseudogenes*
  • Staphylococcus aureus / genetics
  • Streptococcus pyogenes / genetics
  • Vibrio / genetics
  • Yersinia / genetics