Testing Calibration of Cox Survival Models at Extremes of Event Risk

Front Genet. 2018 May 22:9:177. doi: 10.3389/fgene.2018.00177. eCollection 2018.

Abstract

Risk prediction models can translate genetic association findings for clinical decision-making. Most models are evaluated on their ability to discriminate, and the calibration of risk-prediction models is largely overlooked in applications. Models that demonstrate good discrimination in training datasets, if not properly calibrated to produce unbiased estimates of risk, can perform poorly in new patient populations. Poorly calibrated models arise due to missing covariates, such as genetic interactions that may be unknown or not measured. We demonstrate that models omitting interactions can lead to increased bias in predicted risk for patients at the tails of the risk distribution; i.e., those patients who are most likely to be affected by clinical decision making. We propose a new calibration test for Cox risk-prediction models that aggregates martingale residuals for subjects from extreme high and low risk groups with a test statistic maximum chosen by varying which risk groups are included in the extremes. To estimate the empirical significance of our test statistic, we simulate from a Gaussian distribution using the covariance matrix for the grouped sums of martingale residuals. Simulation shows the new test maintains control of type 1 error with improved power over a conventional goodness-of-fit test when risk prediction deviates at the tails of the risk distribution. We apply our method in the development of a prediction model for risk of cystic fibrosis-related diabetes. Our study highlights the importance of assessing calibration and discrimination in predictive modeling, and provides a complementary tool in the assessment of risk model calibration.

Keywords: calibration tests; cox proportional hazards model; extreme risk; goodness-of-fit; prediction.