Objective: To examine the sensitivity of split-sample reliability estimates to the random split of the data and propose alternative methods for improving the stability of the split-sample method.
Data sources and study setting: Data were simulated to reflect a variety of real-world quality measure distributions and scenarios. There is no date range to report as the data are simulated.
Study design: Simulation studies of split-sample reliability estimation were conducted under varying practical scenarios.
Data collection/extraction methods: All data were simulated using functions in R.
Principal findings: Single split-sample reliability estimates can be very dependent on the random split of the data, especially in low sample size and low variability settings. Averaging split-sample estimates over many splits of the data can yield a more stable reliability estimate.
Conclusions: Measure developers and evaluators using the split-sample reliability method should average a series of reliability estimates calculated from many resamples of the data without replacement to obtain a more stable reliability estimate.
Keywords: health care quality; performance measurement; split‐half reliability; split‐sample reliability.
Published 2024. This article is a U.S. Government work and is in the public domain in the USA.