Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study

Diabetes Res Clin Pract. 2014 Sep;105(3):391-8. doi: 10.1016/j.diabres.2014.07.003. Epub 2014 Jul 18.

Abstract

Aims: The aim of this study was to create a prediction model using data mining approach to identify low risk individuals for incidence of type 2 diabetes, using the Tehran Lipid and Glucose Study (TLGS) database.

Methods: For a 6647 population without diabetes, aged ≥20 years, followed for 12 years, a prediction model was developed using classification by the decision tree technique. Seven hundred and twenty-nine (11%) diabetes cases occurred during the follow-up. Predictor variables were selected from demographic characteristics, smoking status, medical and drug history and laboratory measures.

Results: We developed the predictive models by decision tree using 60 input variables and one output variable. The overall classification accuracy was 90.5%, with 31.1% sensitivity, 97.9% specificity; and for the subjects without diabetes, precision and f-measure were 92% and 0.95, respectively. The identified variables included fasting plasma glucose, body mass index, triglycerides, mean arterial blood pressure, family history of diabetes, educational level and job status.

Conclusions: In conclusion, decision tree analysis, using routine demographic, clinical, anthropometric and laboratory measurements, created a simple tool to predict individuals at low risk for type 2 diabetes.

Keywords: Decision tree; Prediction model; Type 2 diabetes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Arterial Pressure
  • Blood Glucose / analysis
  • Body Mass Index
  • Body Weights and Measures
  • Computational Biology
  • Data Mining*
  • Decision Support Techniques
  • Decision Trees*
  • Diabetes Mellitus, Type 2 / diagnosis
  • Diabetes Mellitus, Type 2 / epidemiology*
  • Educational Status
  • Employment
  • Female
  • Humans
  • Incidence
  • Iran / epidemiology
  • Longitudinal Studies
  • Male
  • Marital Status
  • Middle Aged
  • Risk Factors
  • Sensitivity and Specificity
  • Smoking
  • Triglycerides / blood

Substances

  • Blood Glucose
  • Triglycerides