Classification of Diabetes Data Set from Iraq via Different Machine Learning Techniques

Section: Research Paper
Published
Jun 25, 2025
Pages
170-189

Abstract

Diabetes has become one of the most prevalent diseases in Iraq and is listed as one of the leading causes of death. Machine learning provides effective information extraction results by creating predictive models from diagnostic medical datasets collected from diabetes patients in Iraq. In this study, we applied machine learning classification to compare and contrast the performances of classification and regression trees (CART), support vector machines (SVM), random forests (RF), linear discrimination analysis (LDA), and K-nearest neighbors (KNN). We sought to design a model that can predict with maximum accuracy the probability that a person has, is healthy, or is expected to develop diabetes in the future using the two scales of accuracy and kappa. Based on the results obtained from the algorithms, it showed that the accuracy and sequence of the algorithms concerning the training data were Random Forest (RF), Classification and Regression Trees (CART), Support Vector Machine (SVM), Linear Discrimination Analysis (LDA), and K-Nearest Neighbors (KNN). While the test data results showed some differences, the sequence of the algorithms was as follows: SVM, RF, CART, LDA, and KNN were the highest, respectively. The training data set refers to the samples that were used to construct the model, whereas the testing data set is used to evaluate the model's performance. Based on the assessment criteria discussed above, we chose the best machine learning approach to predict diabetes mellitus in Iraq to achieve high performance. All of the strategies listed above are approximated using a supervised diabetes testing dataset. The approach that achieves the maximum performance in terms of accuracy and kappa is regarded as the best option. Based on the results, it can be seen that the SVM and RF algorithms predicted diabetes with more accuracy.

References

  1. Alan, A. (2020). Evaluation of performance metrics and test techniques on various data sets in machine learning classification methods (Master thesis). Firat University, Fen Bilimleri Enstitusu, Elazig.
  2. Baran, M. (2020). Classification of multi-label data with machine learning methods (Master thesis). Sivas Cumhuriyet University, Sivas.
  3. Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C.J. (1984). Classificationand regression trees. First Edition. Chapman & Hall/CRC, NW, USA. 15-17.
  4. Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C.J. (1984). Classificationand regression trees. First Edition. Chapman & Hall/CRC, NW, USA. 41.
  5. Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet,C. ve Haussler, D. (1999). Support Vector Machine Classification of Microarray Gene Expression Data, Special Work, Department of Computer Science & Biology University of California, Department of Engineering Mathematics, University of Bristol, Bristol, UK. 1-10.
  6. Croux, , Filzmoser, P., Joossens, K. (2008). Classification Efficiency for Robust Linear Discriminant Analysis, Statistica Sinica, 18 (1): 581-599.
  7. Genuer, R., Poggi, J. (2020). USE R Random Forests with R. First Edition. Springer Nature Switzerland 6330 Cham, Switzerland. 10-12.
  8. Genuer, R., Poggi, J. (2020). USE R Random Forests with R. First Edition. Springer Nature Switzerland 6330 Cham, Switzerland. 43-107.
  9. International Diabetes Federation. Chapter 3. The global picture. In: Diabetes Atlas. 8th ed. Brussels, Belgium: International Diabetes Federation; 2017. https://idf.org/e-library/epidemiology-research/diabetes-atlas/134-idf-diabetes-atlas-8th-edition.html. Accessed March (2019).
  10. James, G., Witten, D., Hastie, T., Tibshirani, (2021). An Introduction to Statistical, Learning, with Applications in R. Second Springer, NY 10004, USA. 132- 153.
  11. Keskin, A.K. (2018). Investigation of Machine Learning Classification Algorithms (Master thesis). SINOP UNIVERSITY, Fen Bilimleri institute, Sinop.
  12. Khanam, j.j., Simon, Y.F. (2021). Comparison of machine learning algorithms for diabetes prediction, Science Direct, ICT Express: 7, 432439.
  13. Kramer, O. (2013). Dimensionality Reduction with Unsupervised Nearest Neighbors. Volume 51. Springer, USA. 14-15.
  14. Kuhn, , Johnson K. (2016). Applied Predictive Modeling. Fifth Edition. Springer Science Business Media, New York. USA. 275-300.
  15. Mansour AA, Al-Maliky AA, Kasem B, Jabar A, Mosbeh KA. (2014). Prevalence of diagnosed and undiagnosed diabetes mellitus in adults aged 19 years and older in Basrah, Iraq. Diabetes Metab Syndr Obes. ;7:139-144.
  16. Nahzat, S., Yaganoglu, M. (2021). Diabetes Prediction Using Machine Learning Classification Algorithms. European Journal of Science and Technology, (24), 53-59.
  17. Parthiban, G., Srivatsa, S.K. (2012). Applying Machine Learning Methods in Diagnosing Heart Disease for Diabetic Patients. International Journal of Applied Information Systems (IJAIS). Foundation of Computer Science FCS, Volume 3 No.7, 25-30.
  18. Ramasubramanian, K., Singh, A. (2019). Machine Learning Using R with Time Series and Industry-Based Use Cases in R. Second Edition. Apress Media, LLC California LLC, USA. 3.
  19. Rebala, G., Ravi, A., Churiwala, S. (2019). An Introduction to Machine Learning. Springer, Gewerbestrasse 11, 6330 Cham, Switzerland. 58-80.
  20. Rebala, G., Ravi, A., Churiwala, S. (2019). An Introduction to Machine Learning. Springer, Gewerbestrasse 11, 6330 Cham, Switzerland. 77-91.
  21. Rebala, G., Ravi, A., Churiwala, S. (2019). An Introduction to Machine Learning. Springer, Gewerbestrasse 11, 6330 Cham, Switzerland. 9-11.
  22. Suthaharan, SH. (2016). Machine Learning Models and Algorithms for Big Data Classification. Volume 36. Springer Science & Business Media, New York 2016, USA. 7.
  23. World Health Organization. ; (2018). Diabetes. Geneva, Switzerland: World HealthOrganization https://www.who.int/news-room/fact-sheets/detail/diabetes. Updated October 30, 2018. Accessed March (2019).

Identifiers

Download this PDF file

Statistics

How to Cite

Omar Altalabani, D., دلشاد, & Erdogan, F. (2025). Classification of Diabetes Data Set from Iraq via Different Machine Learning Techniques. IRAQI JOURNAL OF STATISTICAL SCIENCES, 21(1), 170–189. Retrieved from https://rjps.uomosul.edu.iq/index.php/stats/article/view/20585