Detection of common copy number variation with application to population clustering from next generation sequencing data

Annu Int Conf IEEE Eng Med Biol Soc. 2012:2012:1246-9. doi: 10.1109/EMBC.2012.6346163.

Abstract

Copy number variation (CNV) is a structural variation in human genome that has been associated with many complex diseases. In this paper we present a method to detect common copy number variation from next generation sequencing data. First, copy number variations are detected from each individual sample, which is formulated as a total variation penalized least square problem. Second, the common copy number discovery from multiple samples is obtained using source separation techniques such as the non-negative matrix factorization (NMF). Finally, the method is applied to population clustering. The results on real data analysis show that two family trio with different ancestries can be clustered into two ethnic groups based on their common CNVs, demonstrating the potential of the proposed method for application to population genetics.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Cluster Analysis
  • Computational Biology / methods*
  • DNA Copy Number Variations*
  • Databases, Genetic
  • Female
  • Genetics, Population / methods*
  • Genome, Human
  • Humans
  • Male
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*