Internal validation of risk models in clustered data: a comparison of bootstrap schemes

Am J Epidemiol. 2013 Jun 1;177(11):1209-17. doi: 10.1093/aje/kws396. Epub 2013 May 9.

Abstract

Internal validity of a risk model can be studied efficiently with bootstrapping to assess possible optimism in model performance. Assumptions of the regular bootstrap are violated when the development data are clustered. We compared alternative resampling schemes in clustered data for the estimation of optimism in model performance. A simulation study was conducted to compare regular resampling on only the patient level with resampling on only the cluster level and with resampling sequentially on both the cluster and patient levels (2-step approach). Optimism for the concordance index and calibration slope was estimated. Resampling of only patients or only clusters showed accurate estimates of optimism in model performance. The 2-step approach overestimated the optimism in model performance. If the number of centers or intraclass correlation coefficient was high, resampling of clusters showed more accurate estimates than resampling of patients. The 3 bootstrap schemes also were applied to empirical data that were clustered. The results presented in this paper support the use of resampling on only the clusters for estimation of optimism in model performance when data are clustered.

Keywords: bootstrapping; clustered data; internal validation; model performance; multilevel analysis; risk models.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Humans
  • Models, Statistical*
  • Postoperative Nausea and Vomiting
  • Regression Analysis
  • Risk Assessment*
  • Validation Studies as Topic