Using Machine Learning to Predict Microvascular Complications in Patients With Type 1 Diabetes



Journal Title

Journal ISSN

Volume Title



Background: Diabetic microvascular complications can lead to long-term morbidity and mortality, significantly drive healthcare costs, and impair quality of life of patients with type 1 diabetes (T1D). Early prediction and prevention of microvascular complications, including nephropathy, retinopathy, and neuropathy in T1D patients can support informed clinical decision making and potentially delay the progression to long-term adverse outcomes. Although machine learning (ML) methods have been applied for disease prediction in healthcare, there is very limited research using advanced ML methods (e.g., neural networks) for the prediction of microvascular complications in T1D patients. Moreover, there is no study that has explicitly compared the performance of different predictive models. In addition, none of the predictive models in previous studies incorporated A1C variability as a predictor, specifically in ML models. Objectives: The first objective of this study was to develop and compare predictive models, namely, ML and conventional statistical models for 3 microvascular complications (diabetic nephropathy, retinopathy, and neuropathy) in T1D patients. The second objective of this study was to develop and compare predictive models, namely, ML and conventional statistical models and evaluate whether A1C variability can help better predict each of the 3 microvascular complications (diabetic nephropathy, retinopathy and neuropathy) in T1D patients. Methods: This was a factorial experimental study using retrospective real-world registry data. Adult T1D patients participating in the T1D Exchange Clinic Registry and met the eligibility criteria were included for the analysis. Baseline characteristics of eligible T1D patients that were measured between 2010 and 2012 were used to predict three microvascular complications that were measured till 2017. Two ML methods, i.e., support vector machine (SVM) and neural network (NN) and one conventional statistical method, i.e., logistic regression (LR) were used to develop predictive models. The three microvascular complications, i.e., diabetic nephropathy, retinopathy and neuropathy were operationalized as binary variables (yes/no). Predictors for each microvascular complication were selected. Specifically, A1C variability was manipulated into the following 5 levels: a) single A1C, b) mean A1C, c) combination single, d) combination mean, and e) multiple. Models were first developed through 10-fold cross-validation on the train set. Then the model was fit on the entire train set and evaluated on the test set. Hence, for each microvascular complication, 11 (10+1) predictive models were developed using each modeling method with each predictor set. A total of 495 models (11 x 5 predictor sets x 3 modeling method x 3 microvascular complications) were developed, 165 models for each microvascular complication. Performance measure was operationalized as F1 score. Factorial analysis of variance (ANOVA) was used to test research hypotheses. Post hoc Tukey-Kramer test was performed to evaluate which levels within a factor were significantly different. An alpha level of <0.05 was used to determine statistical significance of an association. Data preparation process, summary statistics, correlation analysis and LR were performed using SAS 9.4 (SAS Institute, Inc. Cary, NC). Predictive modelling by SVM and ANN were performed through Scikit-learn 0.22.1 and the Keras application programming interface (API) of TensorFlowTM online version 1.0.0. Results: A total of 4476, 3595, and 4072 patients met the eligibility criteria and included in the cohort of nephropathy, retinopathy, and retinopathy, respectively. Within each cohort, 510 (11%), 659 (18%) and 579 (14%) developed nephropathy, retinopathy, and neuropathy, respectively during the follow-up period. Patients of the three cohorts were on average 38-40 (±14.5-15.4) years and had been diagnosed with T1D for an average (±SD) of 19-21 (±11.3-12.5) years. Slightly more than half (53-55%) of patients were women. For the first objective, the mean (±SD) F1 score of 33 LR models were 0.19±0.10, lower than that of 33 SVM models (0.38±0.03) and 33 NN models (0.38±0.03). Two-way ANOVA indicated a significant interaction between the effects of modeling method and microvascular complication on performance measure (F1 scores, p<.0001). ML models performed significantly better than LR models within each study cohort. Post hoc Tukey-Cramer test indicated there was no statistical difference between F1 scores of SVM and NN models. For objective 2, three-way ANOVA indicated significant interactions between modeling method, microvascular complication and A1C variability. Hence, two-way ANOVA was performed within each cohort. F test indicates that A1C variability had significant effect on F1 score of the nephropathy cohort when the modeling method was NN (F=6.78, p<.0001). Post hoc Tukey-Kramer test indicates that mean F1 scores of the nephropathy cohort from NN models using d) combination mean or e) multiple were significantly higher than using b) mean A1C or c) combination single. In the cohort of retinopathy, there is no effect of A1C variability on performance measure. Lastly, in the cohort of neuropathy, F test indicates the A1C variability had significant effect on performance measure when the modeling method was LR (F=8.19, p<.0001). Post hoc Tukey-Kramer test indicates that mean F1 score of the neuropathy cohort from LR models using e) multiple was significantly lower than using other A1C variability measures. Across all three cohorts, ML models performed significantly better than LR models. Conclusion: The study indicates that ML models compared to LR models produced significantly higher F1 scores for predicting all three types of microvascular complications irrespective of which A1C variability measure was used. The study indicates that it is better to use A1C variability combination mean or multiple for evaluating A1C variability when predicting diabetic nephropathy in T1D patients using NN machine learning models. Future research is needed to develop decision support systems that can advise clinicians based on the results from predictive models.



Machine learning, Neural networks, Support vector machine, Predictive modeling, type 1 diabetes, Diabetic nephropathy, Diabetic retinopathy, Diabetic neuropathy


Portions of this document appear in: Xu, Q., Wang, L., & Sansgiry, S. 2019 Nov 13. A systematic literature review of predicting diabetic retinopathy, nephropathy and neuropathy in patients with type 1 diabetes using machine learning. Journal of Medical Artificial Intelligence. [Online] 3:0