Missing phenotype data imputation in pedigree data analysis

Genet Epidemiol. 2008 Jan;32(1):52-60. doi: 10.1002/gepi.20261.

Abstract

Mapping complex traits or phenotypes with small genetic effects, whose phenotypes may be modulated by temporal trends in families are challenging. Detailed and accurate data must be available on families, whether or not the data were collected over time. Missing data complicate matters in pedigree analysis, especially in the case of a longitudinal pedigree analysis. Because most analytical methods developed for the analysis of longitudinal pedigree data require no missing data, the researcher is left with the option of dropping those cases (individuals) with missing data from the analysis or imputing values for the missing data. We present the use of data augmentation within Bayesian polygenic and longitudinal polygenic models to produce k complete datasets. The data augmentation, or imputation step of the Markov chain Monte Carlo, takes into account the observed familial information and the observed subject information available at other time points. These k complete datasets can then be used to fit single time point or longitudinal pedigree models. By producing a set of k complete datasets and thus k sets of parameter estimates, the total variance associated with an estimate can be partitioned into a within-imputation and a between-imputation component. The method is illustrated using the Genetic Analysis Workshop simulated data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Computer Simulation
  • Genetic Linkage*
  • Genotype
  • Humans
  • Longitudinal Studies
  • Models, Genetic*
  • Models, Statistical
  • Monte Carlo Method
  • Multifactorial Inheritance / genetics*
  • Pedigree
  • Phenotype*
  • Statistics as Topic