Statistical analyses of counts and distributions of restriction sites in DNA sequences

Nucleic Acids Res. 1992 Mar 25;20(6):1363-70. doi: 10.1093/nar/20.6.1363.

Abstract

Counts and spacings of all 4- and 6-bp palindromes in DNA sequences from a broad range of organisms were investigated. Both 4- and 6-bp average palindrome counts were significantly low in all bacteriophages except one, probably as a means of avoiding restriction enzyme cleavage. The exception, T4 of normal 4- and 6-palindrome counts, putatively derives protection from modification of cytosine to hydroxymethylcytosine plus glycosylation. The counts and distributions of 4-bp and of 6-bp restriction sites in bacterial species are variable. Bacterial cells with multiple restriction systems for 4-bp or 6-bp target specificities are low in aggregate 4- or 6-bp palindrome counts/kb, respectively, but bacterial cells lacking exact 4-cutter enzymes generally show normal or high counts of 4-bp palindromes when compared with random control sequences of comparable nucleotide frequencies. For example, E. coli, apparently without an exact 4-bp target restriction endonuclease (see text), contains normal aggregate 4-palindrome counts/kb, while B. subtilis, which abounds with 4-bp restriction systems, shows a significant under-representation of 4-palindrome counts. Both E. coli and B. subtilis have many 6-bp restriction enzymes and concomitantly diminished aggregate 6-palindrome counts/kb. Eukaryote, viral, and organelle sequences generally have aggregate 4- and 6-palindromic counts/kb in the normal range. Interpretations of these results are given in terms of restriction/methylation regimes, recombination and transcription processes, and possible structural and regulatory roles of 4- and 6-bp palindromes.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Bacteria
  • Base Sequence
  • DNA Restriction Enzymes / metabolism*
  • DNA, Bacterial / metabolism*
  • Humans
  • Molecular Sequence Data
  • Repetitive Sequences, Nucleic Acid
  • Statistics as Topic
  • Viruses

Substances

  • DNA, Bacterial
  • DNA Restriction Enzymes