Internal validation of risk models in clustered data: a comparison of bootstrap schemes

W Bouwmeester; K G M Moons; T H Kappen; W A van Klei; J W R Twisk; M J C Eijkemans; Y Vergouwe

doi:10.1093/aje/kws396

Internal validation of risk models in clustered data: a comparison of bootstrap schemes

Am J Epidemiol. 2013 Jun 1;177(11):1209-17. doi: 10.1093/aje/kws396. Epub 2013 May 9.

Authors

W Bouwmeester¹, K G M Moons, T H Kappen, W A van Klei, J W R Twisk, M J C Eijkemans, Y Vergouwe

Affiliation

¹ Julius Center for Health Sciences and Primary Care, UMC Utrecht, P.O. Box 85500, 3508 GA Utrecht, The Netherlands.

PMID: 23660796
DOI: 10.1093/aje/kws396

Abstract

Internal validity of a risk model can be studied efficiently with bootstrapping to assess possible optimism in model performance. Assumptions of the regular bootstrap are violated when the development data are clustered. We compared alternative resampling schemes in clustered data for the estimation of optimism in model performance. A simulation study was conducted to compare regular resampling on only the patient level with resampling on only the cluster level and with resampling sequentially on both the cluster and patient levels (2-step approach). Optimism for the concordance index and calibration slope was estimated. Resampling of only patients or only clusters showed accurate estimates of optimism in model performance. The 2-step approach overestimated the optimism in model performance. If the number of centers or intraclass correlation coefficient was high, resampling of clusters showed more accurate estimates than resampling of patients. The 3 bootstrap schemes also were applied to empirical data that were clustered. The results presented in this paper support the use of resampling on only the clusters for estimation of optimism in model performance when data are clustered.

Keywords: bootstrapping; clustered data; internal validation; model performance; multilevel analysis; risk models.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Computer Simulation
Humans
Models, Statistical*
Postoperative Nausea and Vomiting
Regression Analysis
Risk Assessment*
Validation Studies as Topic