Summarizing specific profiles in Illumina sequencing from whole-genome amplified DNA

DNA Res. 2014 Jun;21(3):243-54. doi: 10.1093/dnares/dst054. Epub 2013 Dec 18.

Abstract

Advances in both high-throughput sequencing and whole-genome amplification (WGA) protocols have allowed genomes to be sequenced from femtograms of DNA, for example from individual cells or from precious clinical and archived samples. Using the highly curated Caenorhabditis elegans genome as a reference, we have sequenced and identified errors and biases associated with Illumina library construction, library insert size, different WGA methods and genome features such as GC bias and simple repeat content. Detailed analysis of the reads from amplified libraries revealed characteristics suggesting that majority of amplified fragment ends are identical but inverted versions of each other. Read coverage in amplified libraries is correlated with both tandem and inverted repeat content, while GC content only influences sequencing in long-insert libraries. Nevertheless, single nucleotide polymorphism (SNP) calls and assembly metrics from reads in amplified libraries show comparable results with unamplified libraries. To utilize the full potential of WGA to reveal the real biological interest, this article highlights the importance of recognizing additional sources of errors from amplified sequence reads and discusses the potential implications in downstream analyses.

Keywords: Illumina; SNPs; chimeric DNA; genome assembly; whole-genome amplification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Caenorhabditis elegans / genetics*
  • DNA, Helminth / genetics*
  • Genome, Helminth*
  • Genomic Library*
  • High-Throughput Nucleotide Sequencing / methods*
  • Nucleic Acid Amplification Techniques / methods*
  • Polymorphism, Single Nucleotide

Substances

  • DNA, Helminth