Detection of repeat expansions in large next generation DNA and RNA sequencing data without alignment

Sci Rep. 2022 Jul 30;12(1):13124. doi: 10.1038/s41598-022-17267-z.

Abstract

Bioinformatic methods for detecting short tandem repeat expansions in short-read sequencing have identified new repeat expansions in humans, but require alignment information to identify repetitive motif enrichment at genomic locations. We present superSTR, an ultrafast method that does not require alignment. superSTR is used to process whole-genome and whole-exome sequencing data, and perform the first STR analysis of the UK Biobank, efficiently screening and identifying known and potential disease-associated STRs in the exomes of 49,953 biobank participants. We demonstrate the first bioinformatic screening of RNA sequencing data to detect repeat expansions in humans and mouse models of ataxia and dystrophy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • DNA
  • Exome Sequencing
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Mice
  • Microsatellite Repeats* / genetics
  • RNA* / genetics
  • Sequence Analysis, RNA

Substances

  • RNA
  • DNA