Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study

Azra Ramezankhani; Omid Pournik; Jamal Shahrabi; Davood Khalili; Fereidoun Azizi; Farzad Hadaegh

doi:10.1016/j.diabres.2014.07.003

Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study

Diabetes Res Clin Pract. 2014 Sep;105(3):391-8. doi: 10.1016/j.diabres.2014.07.003. Epub 2014 Jul 18.

Authors

Azra Ramezankhani¹, Omid Pournik², Jamal Shahrabi³, Davood Khalili⁴, Fereidoun Azizi⁵, Farzad Hadaegh⁶

Affiliations

¹ Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
² Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran; Medical Informatics Research Center, Faculty of Medicine, Mashhad, Iran.
³ Industrial Engineering Department, Amirkabir University of Technology, Tehran, Iran.
⁴ Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran; Department of Epidemiology, School of Public Health, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
⁵ Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
⁶ Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran. Electronic address: fzhadaegh@endocrine.ac.ir.

PMID: 25085758
DOI: 10.1016/j.diabres.2014.07.003

Abstract

Aims: The aim of this study was to create a prediction model using data mining approach to identify low risk individuals for incidence of type 2 diabetes, using the Tehran Lipid and Glucose Study (TLGS) database.

Methods: For a 6647 population without diabetes, aged ≥20 years, followed for 12 years, a prediction model was developed using classification by the decision tree technique. Seven hundred and twenty-nine (11%) diabetes cases occurred during the follow-up. Predictor variables were selected from demographic characteristics, smoking status, medical and drug history and laboratory measures.

Results: We developed the predictive models by decision tree using 60 input variables and one output variable. The overall classification accuracy was 90.5%, with 31.1% sensitivity, 97.9% specificity; and for the subjects without diabetes, precision and f-measure were 92% and 0.95, respectively. The identified variables included fasting plasma glucose, body mass index, triglycerides, mean arterial blood pressure, family history of diabetes, educational level and job status.

Conclusions: In conclusion, decision tree analysis, using routine demographic, clinical, anthropometric and laboratory measurements, created a simple tool to predict individuals at low risk for type 2 diabetes.

Keywords: Decision tree; Prediction model; Type 2 diabetes.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Aged
Arterial Pressure
Blood Glucose / analysis
Body Mass Index
Body Weights and Measures
Computational Biology
Data Mining*
Decision Support Techniques
Decision Trees*
Diabetes Mellitus, Type 2 / diagnosis
Diabetes Mellitus, Type 2 / epidemiology*
Educational Status
Employment
Female
Humans
Incidence
Iran / epidemiology
Longitudinal Studies
Male
Marital Status
Middle Aged
Risk Factors
Sensitivity and Specificity
Smoking
Triglycerides / blood

Substances

Blood Glucose
Triglycerides