Clinical characterization of data-driven diabetes subgroups in Mexicans using a reproducible machine learning approach

BMJ Open Diabetes Res Care. 2020 Jul;8(1):e001550. doi: 10.1136/bmjdrc-2020-001550.

Abstract

Introduction: Previous reports in European populations demonstrated the existence of five data-driven adult-onset diabetes subgroups. Here, we use self-normalizing neural networks (SNNN) to improve reproducibility of these data-driven diabetes subgroups in Mexican cohorts to extend its application to more diverse settings.

Research design and methods: We trained SNNN and compared it with k-means clustering to classify diabetes subgroups in a multiethnic and representative population-based National Health and Nutrition Examination Survey (NHANES) datasets with all available measures (training sample: NHANES-III, n=1132; validation sample: NHANES 1999-2006, n=626). SNNN models were then applied to four Mexican cohorts (SIGMA-UIEM, n=1521; Metabolic Syndrome cohort, n=6144; ENSANUT 2016, n=614 and CAIPaDi, n=1608) to characterize diabetes subgroups in Mexicans according to treatment response, risk for chronic complications and risk factors for the incidence of each subgroup.

Results: SNNN yielded four reproducible clinical profiles (obesity related, insulin deficient, insulin resistant, age related) in NHANES and Mexican cohorts even without C-peptide measurements. We observed in a population-based survey a high prevalence of the insulin-deficient form (41.25%, 95% CI 41.02% to 41.48%), followed by obesity-related (33.60%, 95% CI 33.40% to 33.79%), age-related (14.72%, 95% CI 14.63% to 14.82%) and severe insulin-resistant groups. A significant association was found between the SLC16A11 diabetes risk variant and the obesity-related subgroup (OR 1.42, 95% CI 1.10 to 1.83, p=0.008). Among incident cases, we observed a greater incidence of mild obesity-related diabetes (n=149, 45.0%). In a diabetes outpatient clinic cohort, we observed increased 1-year risk (HR 1.59, 95% CI 1.01 to 2.51) and 2-year risk (HR 1.94, 95% CI 1.13 to 3.31) for incident retinopathy in the insulin-deficient group and decreased 2-year diabetic retinopathy risk for the obesity-related subgroup (HR 0.49, 95% CI 0.27 to 0.89).

Conclusions: Diabetes subgroup phenotypes are reproducible using SNNN; our algorithm is available as web-based tool. Application of these models allowed for better characterization of diabetes subgroups and risk factors in Mexicans that could have clinical applications.

Keywords: ethnic groups; insulin resistance; statistical models; type 2 diabetes mellitus.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Diabetes Mellitus, Type 2* / diagnosis
  • Diabetes Mellitus, Type 2* / epidemiology
  • Humans
  • Machine Learning
  • Metabolic Syndrome* / diagnosis
  • Metabolic Syndrome* / epidemiology
  • Nutrition Surveys
  • Reproducibility of Results