Mining Streaming Database: A Review

Section: Review Paper
Published
Jun 25, 2025
Pages
153-164

Abstract

Background: Tuberculosis (TB) is a globally deadly infectious disease responsible for 10 million new cases and 1.5 million deaths annually. Shorter TB treatment regimens show promise in reducing this problem, but there is an improved treatment success rate in South Africa, while retreatment cases remain a concern. An important feature of time-to-event modelling is its ability to consider transition probabilities of heterogeneous subgroups with different risk profiles. Survival analysis is generally performed to accurately estimate the transition probabilities associated with the risk profiles. This study explored the application of a flexible parametric survival model for analysing censored time-to-event data among TB patients.Methods: The data were obtained from East London Central Clinic-TB unit, Eastern Cape, South Africa. In total, 174 patients were included in the analysis. The goodness of fit of the models was explored using AIC. We estimated the hazard ratios and baseline cumulative hazards of our model, which are necessary to calculate individual transition probabilities, and compared the model with the Cox model and additive hazard model to determine the survival predictions of TB patients.Result: The flexible parametric survival model produced hazard ratio and baseline cumulative hazard estimates that were similar to those obtained using the Cox proportional hazards model. The analysis revealed that sex (HR=0.49, 95% CI: 0.38, 0.62), antiretroviral therapy, ART (HR=0.53, 95% CI: 0.34, 0.78), and diabetes (HR=0.58, 95% CI: 0.41, 0.78) were all statistically significant factors associated with improved treatment survival in tuberculosis patients.Conclusion: Flexible parametric survival models are a powerful tool for modelling time-to-event data and individual transition probabilities. It is of great importance to fit models by modelling the baseline, which makes it easier to make different types of predictions and allows for non-proportional hazards since it is an interaction.

References

  1. (2024, January 9). Streaming Database: An Overview with Use Cases. https://hazelcast.com/glossary/streaming-database/
  2. Agrahari, S., & Singh, A. K. (2022). Concept drift detection in data stream mining: A literature review.Journal of King Saud University-Computer and Information Sciences,34(10), 9523-9540., doi: https://doi.org/10.1016/j.jksuci.2021.11.006.
  3. Alhassan, A., Zafar, B., & Mueen, A. (2020). Predict students academic performance based on their assessment grades and online activity data.International Journal of Advanced Computer Science and Applications,11(4).
  4. Alothali, E., Alashwal, H., & Harous, S. (2019). Data stream mining techniques: a review.TELKOMNIKA (Telecommunication Computing Electronics and Control),17(2), 728-737. , doi: 10.12928/TELKOMNIKA.v17i2.11752.
  5. Amazon Web Services ,20-12-2023,https://aws.amazon.com/what-is/streaming-data.
  6. Arya, M. (2021).Ensemble-based algorithm for efficient classification of real time data streams(Doctoral dissertation, School of Computer Science, UPES, Dehradun).
  7. Bahri, M., Bifet, A., Gama, J., Gomes, H. M., & Maniu, S. (2021). Data stream analysis: Foundations, major tasks and tools.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,11(3), e1405., doi: https://doi.org/10.1002/widm.1405.
  8. Benjelloun, S., El Aissi, M. E. M., Loukili, Y., Lakhrissi, Y., Ali, S. E. B., Chougrad, H., & El Boushaki, A. (2020, October). Big data processing: batch-based processing and stream-based processing. In2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS)(pp. 1-6).IEEE. , doi:1109/ICDS50568.2020.9268684.
  9. Biernat, N. A. (2020).Scalability benchmarking of Apache Flink(Doctoral dissertation, Kiel University).
  10. Chellappan, S., & Ganesan, D. (2019).MongoDB Recipes: With Data Modeling and QuerBuilding Strategies. A press., doi: https://doi.org/10.1007/978-1-4842-4891-1.
  11. Chen, J. K., & Lee, W. Z. (2019). An Introduction of NoSQL Databases based on their categoriesandapplicat ionindutries.Algorithms,12(5),106, doi: https://doi.org/10.3390/a12050106.
  12. Chen, W., Milosevic, Z., Rabhi, F. A., & Berry, A. (2023). Real-Time Analytics: Concepts, Architectures and ML/AI Considerations.IEEE Access., doi:1109/ACCESS.2023.3295694.
  13. Din, S. U., Shao, J., Kumar, J., Mawuli, C. B., Mahmud, S. H., Zhang, W., & Yang, Q. (2021). Data stream classification with novel class detection: a review, comparison and challenges.Knowledge and Information Systems,63, 2231-2276. , doi: https://doi.org/10.1007/s10115-021-01582-4.
  14. Gama, J. (2010).Knowledge discovery from data streams. CRC Press.
  15. Goltsis, A. (2022).A Performance Comparison of SQL and NoSQL Database Managem Sys tems for 5G Radio Base Station Configuration.
  16. Guo, J., Wang, H., Li, X., & Zhang, L. (2021). Stream classification algorithm based on decision tree.Mobile Information Systems,2021, 1-11. , doi: https://doi.org/10.1155/2021/310305.
  17. Guo, Y., Zhang, Z., & Tang, F. (2021). Feature selection with kernelized multi-class support vector machine.Pattern Recognition,117, 107988
  18. Hu, H. (2022). Solving the challenges of concept drift in data stream classification, doi: https://doi.org/10.18297/etd/3947 .
  19. Isah, H., Abughofa, T., Mahfuz, S., Ajerla, D., Zulkernine, F., & Khan, S. (2019). A survey of distributed data stream processing frameworks.IEEE Access,7, 154300-154316., doi:1109/ACCESS.2019.2946884.
  20. Islam, M., Chen, G., & Jin, S. (2019). An overview of neural network.American Journal of Neural Networks and Applications,5(1), 7-11. , doi: 10.11648/j.ajnna.20190501.12.
  21. Jiawei, H., Micheline, K., & Jian, P. (2011). Data mining concepts and techniques third edition.The Morgan Kaufmann Series in Data Management Systems,5(4), 83-124.
  22. Kavitha, A. R., Simon, M. D., & Sumathy, G. (2023). Novel Fuzzy Entropy Based Leaky Shufflenet Content Based Video Retrival System. , doi: https://doi.org/10.21203/rs.3.rs-2424204/v1.
  23. Lara-Bentez, P., Carranza-Garca, M., Luna-Romera, J. M., & Riquelme, J. C. (2023). Short-term solar irradiance forecasting in streaming with deep learning.Neurocomputing,546, 126312.
  24. Mahdi, O. A. (2020).Diversity Measures as New Concept Drift Detection Methods in Da ta Stream Mining(Doctoral dissertation, La Trobe).
  25. Meier,A., & Kaufmann, M. (2019).SQL & NoSQL databases. Berlin/Heidelberg, Germny:Springer Fachmedien Wiesbaden ., doi: https://doi.org/10.1007/978-3-658-24549-8_7.
  26. Palanisamy, S., & SuvithaVani, P. (2020, January).A survey on RDBMS and NoSQL Databases MySQL vs MongoDB.In2020 International Conference on Computer Communication and Informatics (ICCCI)(pp. 1-7).IEEE., doi:1109/ICCCI48352.2020.9104047.
  27. Reddy, H. B. S., Reddy, R. R. S., Jonnalagadda, R., Singh, P., & Gogineni, A. (2022). Analy sis of the Unexplored Security Issues Common to All Types of NoSQL Databases.Asian Journal of Research in Computer Science,14(1), 1-12., doi: 10.9734/AJRCOS/2022/v14i130323.
  28. RisingWave ,20-12-2023, https://risingwave.com/blog/what-is-a-streaming-database/
  29. Saini, H., Rathee, G., & Saini, D. K. (Eds.). (2020).Large-scale Data Streaming, Processing, and Blockchain Security. IGI Global.
  30. Salehinejad, H., Sankar, S., Barfett, J., Colak, E., & Valaee, S. (2017). Recent advances in recurrent neural networks.arXiv preprint arXiv:1801.01078. ,
  31. Samant, R. C., & Patil, S. H. (2022, May).A Systematic and Novel Ensemble Construction Method for Handling Data Stream Challenges.InInternational Conference on Image Processing and Capsule Networks(pp. 260-273).Cham: Springer International Publishing, doi: https://doi.org/10.1007/978-3-031-12413-6_20.
  32. Samant, R. C., & Thakore, D. D. M. (2019). A rigorous review on an ensemble-based data .stream drift classification method. J. Comput. Sci. Eng,7(5), 380-385, doi: https://doi.org/10.26438/ijcse/v7i5.380385.
  33. Sari, W. K., RINI, D. P., MALIK, R. F., & AZHAR, I. S. B. (2020, May). Sequential models for text classification using recurrent neural network. InSriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019)(pp. 333-340). Atlantis Press. , doi: 2991/aisr.k.200424.050.
  34. Taipalus, T. (2023). Database management system performance comparisons: Asystematic literature review.Journal of Systems and Software, 111872,doi:https://doi.org/10.1016/j.jss.2023.11187.
  35. Zheng, X., Li, P., Chu, Z., & Hu, X. (2019). A survey on multi-label data stream classification. IEEE Access, 8, 1249-1275. , doi:1109/ACCESS.2019.2962059.

Identifiers

Download this PDF file

Statistics

How to Cite

Thaher Yaseen Al Abd Alazeez, A., عمار, Muhammed Salih, A., & ازهار. (2025). Mining Streaming Database: A Review. IRAQI JOURNAL OF STATISTICAL SCIENCES, 21(2), 153–164. Retrieved from https://rjps.uomosul.edu.iq/index.php/stats/article/view/21017