Heritability Estimation using a Regularized Regression Approach (HERRA): Applicable to continuous, dichotomous or age-at-onset outcome

PLoS One. 2017 Aug 16;12(8):e0181269. doi: 10.1371/journal.pone.0181269. eCollection 2017.

Abstract

The popular Genome-wide Complex Trait Analysis (GCTA) software uses the random-effects models for estimating the narrow-sense heritability based on GWAS data of unrelated individuals without knowing and identifying the causal loci. Many methods have since extended this approach to various situations. However, since the proportion of causal loci among the variants is typically very small and GCTA uses all variants to calculate the similarities among individuals, the estimation of heritability may be unstable, resulting in a large variance of the estimates. Moreover, if the causal SNPs are not genotyped, GCTA sometimes greatly underestimates the true heritability. We present a novel narrow-sense heritability estimator, named HERRA, using well-developed ultra-high dimensional machine-learning methods, applicable to continuous or dichotomous outcomes, as other existing methods. Additionally, HERRA is applicable to time-to-event or age-at-onset outcome, which, to our knowledge, no existing method can handle. Compared to GCTA and LDAK for continuous and binary outcomes, HERRA often has a smaller variance, and when causal SNPs are not genotyped, HERRA has a much smaller empirical bias. We applied GCTA, LDAK and HERRA to a large colorectal cancer dataset using dichotomous outcome (4,312 cases, 4,356 controls, genotyped using Illumina 300K), the respective heritability estimates of GCTA, LDAK and HERRA are 0.068 (SE = 0.017), 0.072 (SE = 0.021) and 0.110 (SE = 5.19 x 10-3). HERRA yields over 50% increase in heritability estimate compared to GCTA or LDAK.

MeSH terms

  • Adult
  • Age of Onset
  • Aged
  • Aged, 80 and over
  • Animals
  • Case-Control Studies
  • Chromosome Mapping
  • Colorectal Neoplasms / epidemiology
  • Colorectal Neoplasms / genetics
  • Computer Simulation
  • Female
  • Genome-Wide Association Study / methods*
  • Humans
  • Inheritance Patterns
  • Likelihood Functions
  • Male
  • Middle Aged
  • Models, Genetic*
  • Polymorphism, Single Nucleotide
  • Quantitative Trait, Heritable*
  • Software*