Combining Cluster Analysis with Multiple Linear Regression Analysis to Create the Most Accurate Prediction Model for Evaporation in the Kurdistan Region of

Section: Research Paper
Published
Jun 25, 2025
Pages
188-199

Abstract

This study aims to build a prediction model for the influential variables of evaporation in the Kurdistan region - Iraq, using the concept of regression and cluster analysis. The methods common guide the work to highlight the strengths of each technique, and the possibility of using hierarchical cluster analysis (nearest neighbor, furthest neighbor, and median) to improve the predictive accuracy of regression models. The variables affecting the evaporation rate were classified using weather data from meteorological stations in the Kurdistan Region, Iraq for the period from January 2020 to December 2022, and The adjusted R2, MSE, and RMSE values were used as indicators of the efficiency of the models performance. The study found that clustering before regression analysis leads to improve prediction accuracy by classifying and identifying homogeneous independent variables within one cluster that are different from the rest of the clusters.

References

  1. Adnan, R. . M., Malik, A. & Kumar, A., 2019. Pan Evaporation Modeling by Three Different Neuro-Fuzzy Intelligent Systems Using Climatic Inputs. Arabian Journal of Geosciences, 12(606).
  2. Al-Mukhtar, M., 2021. Modeling the Monthly Pan Evaporation Rates Using Artificial Intelligence Methods: a Case Study in Iraq. Environmental Earth Sciences, 80(39).
  3. Ali, P. A. & Younas, . A. A., 2021. Understanding and Interpreting Regression Analysis. Evid Based Nurs, 24(4), pp. 116-118.
  4. Almawla , A., 2017. Predicting the daily evaporation in Ramadi city by using artificial neural netwok.. Anbar Journal of Engineering Science, 7(2), pp. 134-139.
  5. Almedeij, J., 2016. Modeling Pan Evaporation for Kuwait using Multiple Linear Regression and Tme-Series Techniques.. American Journal of Applied Sciences, 13(6), pp. 739-747.
  6. Alsumaiei, A. A., 2020. Utility of Artificial Neural Networks in Modeling Pan Evaporation in Hyper-Arid Climates. Water, 12 (1508).
  7. Best, H. & Wolf, C., 2015. Regression Analysis and Causal Inference. In: Regression Analysis and Causal Inference. London:Los Angeles : s.n.
  8. Da Silva, H. d. S. M. J. J. S., 2016. Modeling of reference evapotranspiration by multiple linear regression.. Journal of Hyperspectral Remote Sensing, 6(1), pp. 44-58.
  9. De Carvalho, F. d. A., S. G. & Queiroz, D. N., 2010. A Clusterwise Center and Range Regression Model for Interval-Valued Data.
  10. Esmaeel , S. M. & Rashed, S. N., 2022. Detection of outliers in the linear regression model with application to well water pollution data on the. Iraqi Journal of Statistical Sciences, 19(1), pp. 76-84.
  11. Essa, A. K., Fadhil, L. & Shihab, D. H., 2023. A comparison between the hierarchical clustering methods for postgraduate students in Iraqi universities for the year 2019-2020 using the cophenetic and delta correlation coefficients. Periodicals of Engineering and Natural Sciences ISSN 2303-4521,Original Research, 11(1), pp. 174-185.
  12. Frost, J., 2023. Statistics By Jim Making statistics intuitive. [Online] Available at: https://statisticsbyjim.com/regression/mean-squared-error-mse/
  13. Khattreer, R. & N, D. N., 2020. Multivariate Data Reduction and Discrimination with SAS Software. Cary, NC, USA: SAS Press and John WileyISBN.
  14. Kor, . K. & Altun, G., 2020. Is Support Vector Regression method suitable for predicting rate. Journal of Petroleum Science and Engineering,194.
  15. Mohammd, A. S., Said, M. A. M. & Kaml, A. H., 2022. Develop Evaporation Model Using Multiple Linear Regression in the Western Desert of Iraq. International Journal of Design & Nature and Ecodyna, 17(1), pp. 137-143.
  16. Mohammed , M. . A. & AL-Rawi, D. A. . G., 2019. Using Some of Hierarchical Approach of Cluster Analysis for Classification of Agricultural Lands by Area and the Amount of Production for some Agricultural Crops in the Iraqi Governorates for the Years (2005) and (2010). Journal of Al-Rafidain University College for Sciences, 44(1), pp. 52-74.
  17. Mohammed, F. A. R. & Hannon, O. B., 2019. Using the Hybrid MLR-GA Approach for Air Pollution Forecasting. Iraqi Journal of Statistical Sciences Special Issue, 16(2), pp. 25-36.
  18. Ngo, T. H. & Puente, L., 2012. The Steps to Follow in a Multiple Regression Analysis. SAS Global Forum, Statistics and Data Analysis, 333.
  19. Reference
  20. Seber, . G. . A. & Lee, . A. J., 2012. Linear Regression Analysis.. l.:John Wiley, Sons.
  21. Shahab, Z. . A. & Rashed, S. N., 2021. Using the linear and non-linear discriminant function with cluster analysis to study the level of education for the completed stages (governmental private) In Nineveh Governorate. Iraqi Journal of Statistical Sciences, 18(1), pp. 89-104.
  22. Taha, A. . H., 2022. Use of cluster analysis to study the reality of E-learning due to the Corona-19 pandemic on Nineveh Technical Institute students. Entrepreneurship Journal for Finance and Business, 3(4), pp. 40-51.

Identifiers

Download this PDF file

Statistics

How to Cite

Ahmed Hamad, B., & بخشان. (2025). Combining Cluster Analysis with Multiple Linear Regression Analysis to Create the Most Accurate Prediction Model for Evaporation in the Kurdistan Region of. IRAQI JOURNAL OF STATISTICAL SCIENCES, 20(2), 188–199. Retrieved from https://rjps.uomosul.edu.iq/index.php/stats/article/view/20642