ParseCNV2: efficient sequencing tool for copy number variation genome-wide association studies

Joseph T Glessner; Jin Li; Yichuan Liu; Munir Khan; Xiao Chang; Patrick M A Sleiman; Hakon Hakonarson

doi:10.1038/s41431-022-01222-7

ParseCNV2: efficient sequencing tool for copy number variation genome-wide association studies

Eur J Hum Genet. 2023 Mar;31(3):304-312. doi: 10.1038/s41431-022-01222-7. Epub 2022 Nov 1.

Authors

Joseph T Glessner^{1

2}, Jin Li³, Yichuan Liu^{4

5}, Munir Khan^{4

5}, Xiao Chang^{4

5}, Patrick M A Sleiman^{4

5}, Hakon Hakonarson^{4

5}

Affiliations

¹ Department of Pediatrics, Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, 19104, USA. glessner@chop.edu.
² Department of Pediatrics, Perelman School of Medicine, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA. glessner@chop.edu.
³ Department of Cell Biology, the Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
⁴ Department of Pediatrics, Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, 19104, USA.
⁵ Department of Pediatrics, Perelman School of Medicine, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA.

Abstract

Improved copy number variation (CNV) detection remains an area of heavy emphasis for algorithm development; however, both CNV curation and disease association approaches remain in its infancy. The current practice of focusing on candidate CNVs, where researchers study specific CNVs they believe to be pathological while discarding others, refrains from considering the full spectrum of CNVs in a hypothesis-free GWAS. To address this, we present a next-generation approach to CNV association by natively supporting the popular VCF specification for sequencing-derived variants as well as SNP array calls using a PennCNV format. The code is fast and efficient, allowing for the analysis of large (>100,000 sample) cohorts without dividing up the data on a compute cluster. The scripts are condensed into a single tool to promote simplicity and best practices. CNV curation pre and post-association is rigorously supported and emphasized to yield reliable results of highest quality. We benchmarked two large datasets, including the UK Biobank (n > 450,000) and CAG Biobank (n > 350,000) both of which are genotyped at >0.5 M probes, for our input files. ParseCNV has been actively supported and developed since 2008. ParseCNV2 presents a critical addition to formalizing CNV association for inclusion with SNP associations in GWAS Catalog. Clinical CNV prioritization, interactive quality control (QC), and adjustment for covariates are revolutionary new features of ParseCNV2 vs. ParseCNV. The software is freely available at: https://github.com/CAG-CNV/ParseCNV2 .

Publication types

Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Algorithms
DNA Copy Number Variations*
Genome-Wide Association Study*
Humans
Polymorphism, Single Nucleotide
Software

Abstract

Publication types

MeSH terms

Grants and funding