Model Selection and Variable Aggregation of Australian Hospital Data

Stud Health Technol Inform. 2015:214:94-9.

Abstract

Background: Hospital administrative data commonly consist of hundreds of variables with many consisting of hundreds, if not thousands, of distinct categories, especially for disease groups. Conventional approaches to develop regression models for prediction either fail completely due to multicollinearity or sparsity issues or take too long and consume too many computer resources.

Methods: We demonstrate how regularisation and variable aggregation techniques such as Elastic Net can overcome some of these problems. Parameter estimates from univariate generalised linear models (GLM) and Elastic Net models were used to aggregate disease groups into a more manageable number and predict the probability of mortality for a given patient.

Results: When employed for variable aggregation and variable selection, Elastic Net models ran at least four times faster than GLMs, though producing a less discriminative model. When applied to final models for predicting hospital mortality, though, both Elastic Net and GLM models demonstrated similar predictive power and efficiently solved an otherwise complex problem.

Conclusion: Elastic Net regularisation and variable aggregation provide an efficient mechanism for solving healthcare modelling problems.

MeSH terms

  • Australia
  • Datasets as Topic*
  • Decision Support Systems, Clinical / organization & administration*
  • Hospital Administration / methods*
  • Hospital Information Systems / organization & administration*
  • Meaningful Use / organization & administration
  • Medical Record Linkage / methods*
  • Models, Organizational*