Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model

Trials. 2023 Feb 13;24(1):107. doi: 10.1186/s13063-022-07053-7.

Abstract

Background: Adjustment for baseline prognostic factors in randomized clinical trials is usually performed by means of sample-based regression models. Sample-based models may be incorrect due to overfitting. To assess whether overfitting is a problem in practice, we used simulated data to examine the performance of the sample-based model in comparison to a "true" adjustment model, in terms of estimation of the treatment effect.

Methods: We conducted a simulation study using samples drawn from a "population" in which both the treatment effect and the effect of the potential confounder were specified. The outcome variable was binary. Using logistic regression, we compared three estimates of the treatment effect in each situation: unadjusted, adjusted for the confounder using the sample, adjusted for the confounder using the true effect. Experimental factors were sample size (from 2 × 50 to 2 × 1000), treatment effect (logit of 0, 0.5, or 1.0), confounder type (continuous or binary), and confounder effect (logit of 0, - 0.5, or - 1.0). The assessment criteria for the estimated treatment effect were bias, variance, precision (proportion of estimates within 0.1 logit units), type 1 error, and power.

Results: Sample-based adjustment models yielded more biased estimates of the treatment effect than adjustment models that used the true confounder effect but had similar variance, accuracy, power, and type 1 error rates. The simulation also confirmed the conservative bias of unadjusted analyses due to the non-collapsibility of the odds ratio, the smaller variance of unadjusted estimates, and the bias of the odds ratio away from the null hypothesis in small datasets.

Conclusions: Sample-based adjustment yields similar results to exact adjustment in estimating the treatment effect. Sample-based adjustment is preferable to no adjustment.

Keywords: Baseline imbalance; Over-fitting; Randomized clinical trials; Simulation study; Statistical adjustment.

MeSH terms

  • Bias
  • Computer Simulation
  • Humans
  • Logistic Models*
  • Odds Ratio
  • Randomized Controlled Trials as Topic
  • Sample Size