Supervariants identification for breast cancer

Genet Epidemiol. 2020 Nov;44(8):934-947. doi: 10.1002/gepi.22350. Epub 2020 Aug 17.

Abstract

In genome-wide association studies, signals associated with rare variants and interactions between genes are hard to detect even when the sample size is in tens of thousands. To overcome these problems, we examine the concept of supervariant. Like the classic concept of the gene, a supervariant is a combination of alleles in multiple loci, but the contributing loci can be anywhere in the genome. We hypothesize that supervariants are easy to detect and the aggregated signals are more stable in their associations with the disease than that from a single nucleoid polymorphism. Using the UK Biobank databases, we develop a ranking and aggregation method for identifying supervariants. Specifically, we examine 9,377 breast cancer cases with 46,861 controls matched by sex and age. In our simulations, the use of supervariants outperforms single-nucleotide polymorphism-based association method in detecting rare variants and signals with interactive structure. In real data analysis, we identify supervariants on Chromosomes 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 16, and 22 which cover previously reported loci that have associations with breast or other cancers, and several novel loci on Chromosomes 2, 5, 9, and 12. These findings demonstrate the validity of supervariants and its potential of discovering replicable and novel results for complex disease.

Keywords: GWAS; depth importance; gene-gene interaction; random forest.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Alleles
  • Breast Neoplasms / genetics*
  • Computer Simulation
  • Databases, Genetic
  • Female
  • Genetic Predisposition to Disease*
  • Genetic Variation*
  • Genome-Wide Association Study
  • Humans
  • Linkage Disequilibrium / genetics
  • Models, Genetic
  • Polymorphism, Single Nucleotide / genetics