Using a data mining approach to discover behavior correlates of chronic disease: a case study of depression

Stud Health Technol Inform. 2014:201:71-8.

Abstract

The purposes of this methodological paper are: 1) to describe data mining methods for building a classification model for a chronic disease using a U.S. behavior risk factor data set, and 2) to illustrate application of the methods using a case study of depressive disorder. Methods described include: 1) six steps of data mining to build a disease model using classification techniques, 2) an innovative approach to analyzing high-dimensionality data, and 3) a visualization strategy to communicate with clinicians who are unfamiliar with advanced statistics. Our application of data mining strategies identified childhood experience living with mentally ill and sexual abuse, and limited usual activity as the strongest correlates of depression among hundreds variables. The methods that we applied may be useful to others wishing to build a classification model from complex, large volume datasets for other health conditions.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chronic Disease / classification*
  • Chronic Disease / epidemiology*
  • Data Mining / methods*
  • Depression / classification
  • Depression / epidemiology*
  • Electronic Health Records / classification*
  • Electronic Health Records / statistics & numerical data*
  • Health Behavior*
  • Humans
  • Information Storage and Retrieval / methods
  • Pattern Recognition, Automated / methods
  • Risk Assessment / methods