Efficient semiparametric estimation of haplotype-disease associations in case-cohort and nested case-control studies

Biostatistics. 2006 Jul;7(3):486-502. doi: 10.1093/biostatistics/kxj021. Epub 2006 Feb 24.

Abstract

Estimating the effects of haplotypes on the age of onset of a disease is an important step toward the discovery of genes that influence complex human diseases. A haplotype is a specific sequence of nucleotides on the same chromosome of an individual and can only be measured indirectly through the genotype. We consider cohort studies which collect genotype data on a subset of cohort members through case-cohort or nested case-control sampling. We formulate the effects of haplotypes and possibly time-varying environmental variables on the age of onset through a broad class of semiparametric regression models. We construct appropriate nonparametric likelihoods, which involve both finite- and infinite-dimensional parameters. The corresponding nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Consistent variance-covariance estimators are provided, and efficient and reliable numerical algorithms are developed. Simulation studies demonstrate that the asymptotic approximations are accurate in practical settings and that case-cohort and nested case-control designs are highly cost-effective. An application to a major cardiovascular study is provided.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Age of Onset
  • Algorithms
  • Case-Control Studies*
  • Cohort Studies*
  • Computer Simulation
  • Coronary Disease / genetics
  • Genetic Predisposition to Disease / genetics*
  • Haplotypes / genetics*
  • Humans
  • Models, Genetic
  • Polymorphism, Single Nucleotide
  • Proportional Hazards Models
  • Smoking / adverse effects