Purpose: In a cohort in which racial data are unknown for some persons, race-specific persons and person-years are imputed using a model-based iterative allocation algorithm (IAA).
Methods: An EM algorithm-based approach to address misclassification in a censored data regression setting can be adapted to estimate the probability that a person of unknown race is white. The corresponding race-specific person-years are obtained as a by-product of the estimation procedure. Variance estimates are computed using the bootstrap. The proposed approach is compared with the proportional allocation method (PAM).
Results: In an occupational cohort where racial data were missing for 41% of the workers, the age-time-race-specific person-years were estimated within a relative variation of approximately 20%, using the IAA. The deaths were less reliably estimated. The standardized mortality ratios (SMRs) for all-cause mortality estimated using the IAA and the PAM were more similar for the non-white workers than for a smaller subgroup of white workers.
Conclusions: The IAA provides a method to reliably estimate race-specific person-year denominators in cohort studies with missing racial data. This method is applicable to other incompletely observed non-time-dependent categorical covariates. Internal cohort rates or SMRs can be computed and modeled, with bootstrap confidence intervals that account for the uncertainty in the determination of race.