A method for imputing missing data in longitudinal studies

Ann Epidemiol. 2004 May;14(5):354-61. doi: 10.1016/j.annepidem.2003.09.010.

Abstract

Purpose: In a cohort in which racial data are unknown for some persons, race-specific persons and person-years are imputed using a model-based iterative allocation algorithm (IAA).

Methods: An EM algorithm-based approach to address misclassification in a censored data regression setting can be adapted to estimate the probability that a person of unknown race is white. The corresponding race-specific person-years are obtained as a by-product of the estimation procedure. Variance estimates are computed using the bootstrap. The proposed approach is compared with the proportional allocation method (PAM).

Results: In an occupational cohort where racial data were missing for 41% of the workers, the age-time-race-specific person-years were estimated within a relative variation of approximately 20%, using the IAA. The deaths were less reliably estimated. The standardized mortality ratios (SMRs) for all-cause mortality estimated using the IAA and the PAM were more similar for the non-white workers than for a smaller subgroup of white workers.

Conclusions: The IAA provides a method to reliably estimate race-specific person-year denominators in cohort studies with missing racial data. This method is applicable to other incompletely observed non-time-dependent categorical covariates. Internal cohort rates or SMRs can be computed and modeled, with bootstrap confidence intervals that account for the uncertainty in the determination of race.

Publication types

  • Comparative Study

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Algorithms
  • Analysis of Variance
  • Biometry / methods*
  • Cause of Death
  • Child
  • Data Interpretation, Statistical*
  • Humans
  • Likelihood Functions
  • Longitudinal Studies
  • Male
  • Middle Aged
  • Models, Statistical*
  • Occupational Diseases / ethnology*
  • Occupational Diseases / mortality*
  • Pennsylvania / epidemiology
  • Proportional Hazards Models
  • Regression Analysis
  • Software
  • Textile Industry
  • Time Factors
  • White People / classification
  • White People / statistics & numerical data*
  • Workforce