Predicting long-term type 2 diabetes with support vector machine using oral glucose tolerance test

Autoři: Hasan T. Abbas aff001;  Lejla Alic aff002;  Madhav Erraguntla aff003;  Jim X. Ji aff001;  Muhammad Abdul-Ghani aff004;  Qammer H. Abbasi aff005;  Marwa K. Qaraqe aff006
Působiště autorů: Department of Electrical & Computer Engineering, Texas A&M University at Qatar, Doha, Qatar aff001;  Magnetic Detection & Imaging Group, Faculty of Science & Technology, University of Twente, Enschede, The Netherlands aff002;  Department of Industrial & Systems Engineering, Texas A&M University, College Station, Texas, United States of America aff003;  UT Health, San Antonio, Texas, United States of America aff004;  James Watt School of Engineering, University of Glasgow, Glasgow, United Kingdom aff005;  College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar aff006
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article


Diabetes is a large healthcare burden worldwide. There is substantial evidence that lifestyle modifications and drug intervention can prevent diabetes, therefore, an early identification of high risk individuals is important to design targeted prevention strategies. In this paper, we present an automatic tool that uses machine learning techniques to predict the development of type 2 diabetes mellitus (T2DM). Data generated from an oral glucose tolerance test (OGTT) was used to develop a predictive model based on the support vector machine (SVM). We trained and validated the models using the OGTT and demographic data of 1,492 healthy individuals collected during the San Antonio Heart Study. This study collected plasma glucose and insulin concentrations before glucose intake and at three time-points thereafter (30, 60 and 120 min). Furthermore, personal information such as age, ethnicity and body-mass index was also a part of the data-set. Using 11 OGTT measurements, we have deduced 61 features, which are then assigned a rank and the top ten features are shortlisted using minimum redundancy maximum relevance feature selection algorithm. All possible combinations of the 10 best ranked features were used to generate SVM based prediction models. This research shows that an individual’s plasma glucose levels, and the information derived therefrom have the strongest predictive performance for the future development of T2DM. Significantly, insulin and demographic features do not provide additional performance improvement for diabetes prediction. The results of this work identify the parsimonious clinical data needed to be collected for an efficient prediction of T2DM. Our approach shows an average accuracy of 96.80% and a sensitivity of 80.09% obtained on a holdout set.

Klíčová slova:

Blood plasma – Cardiovascular diseases – Glucose tolerance tests – Insulin – Support vector machines


1. Mathers CD, Loncar D. Projections of Global Mortality and Burden of Disease from 2002 to 2030. PLoS Medicine. 2006;3(11):e442. doi: 10.1371/journal.pmed.0030442 17132052

2. Tuomilehto J, Lindström J, Eriksson JG, Valle TT, Hämäläinen H, Ilanne-Parikka P, et al. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. New England Journal of Medicine. 2001;344(18):1343–1350. doi: 10.1056/NEJM200105033441801 11333990

3. Diabetes Prevention Program Research Group. Long-term effects of lifestyle intervention or metformin on diabetes development and microvascular complications over 15-year follow-up: the Diabetes Prevention Program Outcomes Study. The Lancet Diabetes & Endocrinology. 2015;3(11):866–875. doi: 10.1016/S2213-8587(15)00291-0

4. Noble D, Mathur R, Dent T, Meads C, Greenhalgh T. Risk models and scores for type 2 diabetes: systematic review. BMJ. 2011;343:d7163. doi: 10.1136/bmj.d7163 22123912

5. Heikes KE, Eddy DM, Arondekar B, Schlessinger L. Diabetes Risk Calculator. Diabetes Care. 2008;31(5):1040–1045. doi: 10.2337/dc07-1150 18070993

6. Glümer C, Carstensen B, Sandbæk A, Lauritzen T, Jørgensen T, Borch-Johnsen K. A Danish Diabetes Risk Score for Targeted Screening. Diabetes Care. 2004;27(3):727–733. doi: 10.2337/diacare.27.3.727 14988293

7. Heliövaara M, Aromaa A, Klaukka T, Knekt P, Joukamaa M, Impivaara O. Reliability and validity of interview data on chronic diseases The mini-Finland health survey. Journal of Clinical Epidemiology. 1993;46(2):181–191. doi: 10.1016/0895-4356(93)90056-7 8437034

8. Expert Committee on the Diagnosis and Classification of Diabetes Mellitus. Report of the Expert Committee on the Diagnosis and Classification of Diabetes Mellitus. Diabetes Care. 1997;20(7):1183–1197. doi: 10.2337/diacare.20.7.1183 9203460

9. Stumvoll M, Mitrakou A, Pimenta W, Jenssen T, Yki-Järvinen H, Van Haeften T, et al. Use of the oral glucose tolerance test to assess insulin release and insulin sensitivity. Diabetes Care. 2000;23(3):295–301. doi: 10.2337/diacare.23.3.295 10868854

10. World Health Organization, International Diabetes Federation. Definition and diagnosis of diabetes mellitus and intermediate hyperglycaemia: report of a WHO/IDF consultation. World Health Organization; 2006.

11. DeFronzo RA, Abdul-Ghani M. Assessment and treatment of cardiovascular risk in prediabetes: Impaired glucose tolerance and impaired fasting glucose. The American Journal of Cardiology. 2011;108(3):3B–24B. doi: 10.1016/j.amjcard.2011.03.013 21802577

12. Shaw JE, Zimmet PZ, de Courten M, Dowse GK, Chitson P, Gareeboo H, et al. Impaired fasting glucose or impaired glucose tolerance. What best predicts future diabetes in Mauritius? Diabetes Care. 1999;22(3):399–402. doi: 10.2337/diacare.22.3.399 10097917

13. Unwin N, Shaw J, Zimmet P, Alberti KGMM. Impaired glucose tolerance and impaired fasting glycaemia: the current status on definition and intervention. Diabetic Medicine. 2002;19(9):708–723. doi: 10.1046/j.1464-5491.2002.00835.x 12207806

14. Abdul-Ghani MA, Williams K, DeFronzo RA, Stern M. What Is the Best Predictor of Future Type 2 Diabetes? Diabetes Care. 2007;30(6):1544–1548. doi: 10.2337/dc06-1331 17384342

15. Freeze J, Erraguntla M, Verma A. Data Integration and Predictive Analysis System for Disease Prophylaxis: Incorporating Dengue Fever Forecasts. In: Proceedings of the Hawaii International Conference on System Sciences (HICSS); 2018. p. 1–10.

16. Erraguntla M, Zapletal J, Lawley M. Framework for Infectious Disease Analysis: A comprehensive and integrative multi-modeling approach to disease prediction and management. Health Informatics Journal. 2017; p. 1460458217747112. doi: 10.1177/1460458217747112 29278956

17. Zapletal J, Erraguntla M, Adelman ZN, Myles KM, Lawley MA. Impacts of diurnal temperature and larval density on aquatic development of Aedes aegypti. PLOS ONE. 2018;13(3):e0194025. doi: 10.1371/journal.pone.0194025 29513751

18. Zapletal J, Gupta H, Erraguntla M, Adelman ZN, Myles KM, Lawley MA. Predicting aquatic development and mortality rates of Aedes aegypti. PLOS ONE. 2019;14(5):e0217199. doi: 10.1371/journal.pone.0217199 31112566

19. Maeta K, Nishiyama Y, Fujibayashi K, Gunji T, Sasabe N, Iijima K, et al. Prediction of Glucose Metabolism Disorder Risk Using a Machine Learning Algorithm: Pilot Study. JMIR Diabetes. 2018;3(4):e10212. doi: 10.2196/10212 30478026

20. Barakat N, Bradley AP, Barakat MNH. Intelligible Support Vector Machines for Diagnosis of Diabetes Mellitus. IEEE Transactions on Information Technology in Biomedicine. 2010;14(4):1114–1120. doi: 10.1109/TITB.2009.2039485 20071261

21. Han L, Luo S, Yu J, Pan L, Chen S. Rule Extraction From Support Vector Machines Using Ensemble Learning Approach: An Application for Diagnosis of Diabetes. IEEE Journal of Biomedical and Health Informatics. 2015;19(2):728–734. doi: 10.1109/JBHI.2014.2325615 24860043

22. Stern MP, Williams K, Haffner SM. Identification of persons at high risk for type 2 diabetes mellitus: do we need the oral glucose tolerance test? Annals of Internal Medicine. 2002;136(8):575–581. doi: 10.7326/0003-4819-136-8-200204160-00006 11955025

23. Abdul-Ghani MA, Abdul-Ghani T, Stern MP, Karavic J, Tuomi T, Bo I, et al. Two-Step Approach for the Prediction of Future Type 2 Diabetes Risk. Diabetes Care. 2011;34(9):2108–2112. doi: 10.2337/dc10-2201 21788628

24. Abdul-Ghani MA, Lyssenko V, Tuomi T, DeFronzo RA, Groop L. Fasting versus postload plasma glucose concentration and the risk for future type 2 diabetes: results from the Botnia Study. Diabetes Care. 2009;32(2):281–286. doi: 10.2337/dc08-1264 19017778

25. Ozery-Flato M, Parush N, El-Hay T, Visockienė Ž, Ryliškytė L, Badarienė J, et al. Predictive models for type 2 diabetes onset in middle-aged subjects with the metabolic syndrome. Diabetology & Metabolic Syndrome. 2013;5(1):36. doi: 10.1186/1758-5996-5-36

26. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research. 2002;16:321–357. doi: 10.1613/jair.953

27. Domingos P. MetaCost: A General Method for Making Classifiers Cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD’99. New York, NY, USA: ACM; 1999. p. 155–164.

28. Kubat M, Matwin S, et al. Addressing the curse of imbalanced training sets: one-sided selection. In: ICML. vol. 97. Nashville, USA; 1997. p. 179–186.

29. Tang Y, Zhang Y, Chawla NV, Krasser S. SVMs Modeling for Highly Imbalanced Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 2009;39(1):281–288. doi: 10.1109/TSMCB.2008.2002909

30. Burke JP, Williams K, Gaskill SP, Hazuda HP, Haffner SM, Stern MP. Rapid Rise in the Incidence of Type 2 Diabetes From 1987 to 1996: Results From the San Antonio Heart Study. Archives of Internal Medicine. 1999;159(13):1450. doi: 10.1001/archinte.159.13.1450

31. Lorenzo C, Williams K, Hunt KJ, Haffner SM. Trend in the Prevalence of the Metabolic Syndrome and Its Impact on Cardiovascular Disease Incidence: The San Antonio Heart Study. Diabetes Care. 2006;29(3):625–630. doi: 10.2337/diacare.29.03.06.dc05-1755 16505517

32. Vapnik VN. The nature of statistical learning theory. 2nd ed. Statistics for engineering and information science. New York: Springer; 2000.

33. Vapnik VN, Chervonenkis AY. On the uniform convergence of relative frequencies of events to their probabilities. In: Measures of complexity. Springer; 2015. p. 11–30.

34. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning. Springer Series in Statistics. Springer New York; 2001.

35. Seino Y, Ikeda M, Yawata M, Imura H. The insulinogenic index in secondary diabetes. Hormone and Metabolic Research. 1975;7(02):107–115. doi: 10.1055/s-0028-1093759

36. Matsuda M, DeFronzo RA. Insulin sensitivity indices obtained from oral glucose tolerance testing: comparison with the euglycemic insulin clamp. Diabetes Care. 1999;22(9):1462–1470. doi: 10.2337/diacare.22.9.1462 10480510

37. Matthews D, Hosker J, Rudenski A, Naylor B, Treacher D, Turner R. Homeostasis model assessment: insulin resistance and β-cell function from fasting plasma glucose and insulin concentrations in man. Diabetologia. 1985;28(7):412–419. doi: 10.1007/bf00280883 3899825

38. Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005; p. 1226–1238. doi: 10.1109/TPAMI.2005.159 16119262

39. Ross BC. Mutual Information between Discrete and Continuous Data Sets. PLOS ONE. 2014;9(2):1–5. doi: 10.1371/journal.pone.0087357

Článek vyšel v časopise


2019 Číslo 12
Nejčtenější tento týden