Estimation in the semiparametric accelerated failure time model with missing covariates: improving efficiency through augmentation

J Am Stat Assoc. 2017;112(519):1221-1235. doi: 10.1080/01621459.2016.1205500. Epub 2017 Apr 25.

Abstract

This paper considers linear regression with missing covariates and a right censored outcome. We first consider a general two-phase outcome sampling design, where full covariate information is only ascertained for subjects in phase two and sampling occurs under an independent Bernoulli sampling scheme with known subject-specific sampling probabilities that depend on phase one information (e.g., survival time, failure status and covariates). The semiparametric information bound is derived for estimating the regression parameter in this setting. We also introduce a more practical class of augmented estimators that is shown to improve asymptotic efficiency over simple but inefficient inverse probability of sampling weighted estimators. Estimation for known sampling weights and extensions to the case of estimated sampling weights are both considered. The allowance for estimated sampling weights permits covariates to be missing at random according to a monotone but unknown mechanism. The asymptotic properties of the augmented estimators are derived and simulation results demonstrate substantial efficiency improvements over simpler inverse probability of sampling weighted estimators in the indicated settings. With suitable modification, the proposed methodology can also be used to improve augmented estimators previously used for missing covariates in a Cox regression model.