Article Data

  • Views 468
  • Dowloads 128

Original Research

Open Access

Predicting COVID-19 mortality using statistical, machine learning and fuzzy classification methods: insights from a Portuguese cohort study

  • Cecilia Castro1
  • Víctor Leiva2,*,
  • Pedro Cunha3
  • Muhammad Azeem Akbar4

1Centre of Mathematics, Universidade do Minho, 4710-057 Braga, Portugal

2School of Industrial Engineering, Pontificia Universidad Católica de Valparaíso, 2362807 Valparaíso, Chile

3Emergency Department, Hospital Senhora da Hora, 4835-044 Guimarães, Portugal

4Department of Information Technology, Lappeenranta University of Technology, 53850 Lappeenranta, Finland

DOI: 10.22514/sv.2024.152 Vol.20,Issue 12,December 2024 pp.10-27

Submitted: 15 July 2024 Accepted: 24 September 2024

Published: 08 December 2024

*Corresponding Author(s): Víctor Leiva E-mail: victor.leiva@pucv.cl; victorleivasanchez@gmail.com

Abstract

The prediction of mortality in hospitalized COVID-19 patients using non-invasive and easily accessible measurements remains essential for improving patient outcomes, particularly in fast-paced clinical environments. The present study integrates generalized linear models (GLMs), fuzzy rule-based systems, and advanced machine learning algorithms—as support vector machines (SVMs), gradient boosting machines (GBMs), and random forests (RFs)—to predict COVID-19 mortality. The study was conducted on data from a Portuguese hospital, using patient age, length of stay, maximum oxygen administered, and timing of remdesivir (RDV) therapy as key predictors. Logistic regression provided high predictive performance, with an area under the curve (AUC) of 0.908, while the glmnet model achieved AUC = 0.892. Although ensemble methods such as RF (AUC = 0.922) and SVM (AUC = 0.952) demonstrated high accuracy, logistic regression remained competitive and it is superior due to its interpretability. Fuzzy models identified RDV as an important predictor (13.32% contribution), but with ambiguous effects. The logistic regression model found that delayed RDV administration increases mortality risk. These findings underscore the complexity of RDV impact on outcomes and highlight the importance of combining statistical models with machine learning techniques to enhance clinical decision-making for COVID-19 patients.


Keywords

Advanced predictive analytics; Artificial intelligence; Ensemble methods; Fuzzy rule-based classification; Generalized linear models; Non-invasive clinical predictors; Remdesivir treatment; SARS-CoV-2


Cite and Share

Cecilia Castro,Víctor Leiva,Pedro Cunha,Muhammad Azeem Akbar. Predicting COVID-19 mortality using statistical, machine learning and fuzzy classification methods: insights from a Portuguese cohort study. Signa Vitae. 2024. 20(12);10-27.

References

[1] Leão J, Leiva V, Saulo H, Tomazella V. Incorporation of frailties into a cure rate regression model and its diagnostics and application to melanoma data. Statistics in Medicine. 2018; 37: 4421–4440.

[2] Liu Y, Wang J, Leiva V, Tapia A, Tan W, Liu S. Robust autoregressive modeling and its diagnostic analytics with a COVID-19-related application. Journal of Applied Statistics. 2024; 51: 1318–1343.

[3] Akkilic AN, Sabir Z, Raja MAZ, Bulut H. Numerical treatment on the new fractional-order SIDARTHE COVID-19 pandemic differential model via neural networks. The European Physical Journal Plus. 2022; 137: 334.

[4] Sabir Z, Raja MAZ, Alhazmi SE, Gupta M, Arbi A, Baba IA. Applications of artificial neural network to solve the nonlinear COVID-19 mathematical model based on the dynamics of SIQ. Journal of Taibah University for Science. 2022; 16: 874–884.

[5] Sabir Z, Alnahdi AS, Jeelani MB, Abdelkawy MA, Raja MAZ, Baleanu D, et al. Numerical computational heuristic through morlet wavelet neural network for solving the dynamics of nonlinear SITR COVID-19. Computer Modeling in Engineering and Sciences. 2022; 131: 763–785.

[6] Rahman MZU, Akbar MA, Leiva V, Martin-Barreiro C, Imran M, Riaz MT, et al. An IoT-fuzzy intelligent approach for holistic management of COVID-19 patients. Heliyon. 2024; 10: e22454.

[7] Leiva V, Alcudia E, Montano A, Castro C. An epidemiological analysis for assessing and evaluating COVID-19 based on data analytics in Latin American countries. Biology. 2023; 12: 887.

[8] Ospina R, Gondim JAM, Leiva V, Castro C. An overview of forecast analysis with ARIMA models during the COVID-19 pandemic: Methodology and case study in Brazil. Mathematics. 2023; 11: 3069.

[9] Ospina R, Ferreira AGO, de Oliveira HM, Leiva V, Castro C. On the use of machine learning techniques and non-invasive indicators for classifying and predicting cardiac disorders. Biomedicines. 2023; 11: 2604.

[10] Botmart T, Sabir Z, Javeed S, Sandoval Núñez RA, Wajaree Weera, Ali MR, et al. Artificial neural network-based heuristic to solve COVID-19 model including government strategies and individual responses. Informatics in Medicine Unlocked. 2022; 32: 101028.

[11] Redruello-Guerrero P, Jimenez-Gutierrez C, Ramos-Bossini AL, Jiménez-Gutiérrez PM, Rivera-Izquierdo M, Sánchez JB. Artificial intelligence for the triage of COVID-19 patients at the emergency department: a systematic review. Signa Vitae. 2022; 18: 17–26.

[12] Al Aseri Z, Al-Tawfiq JA, Alnakhli M, AlNooh A, Alnassar A, Alkhalid S, et al. Epidemiological characteristics and initial clinical presentation of patients with laboratory-confirmed MERS-CoV infection in an emergency department. Signa Vitae. 2022; 18: 68–74.

[13] Perez-Lillo N, Lagos-Alvarez B, Muñoz-Gutierrez J, Figueroa-Zúñiga J, Leiva V. A statistical analysis for the epidemiological surveillance of COVID-19 in Chile. Signa Vitae. 2022; 18: 19–30.

[14] Ospina R, Leite A, Ferraz C, Magalhaes A, Leiva V. Data-driven tools for assessing and combating COVID-19 outbreaks based on analytics and statistical methods in Brazil. Signa Vitae. 2022; 18: 18–32.

[15] Sardar I, Akbar MA, Leiva V, Alsanad A, Mishra P. Machine learning and automatic ARIMA/Prophet models-based forecasting of COVID-19: methodology, evaluation, and case study in SAARC countries. Stochastic Environmental Research and Risk Assessment. 2023; 37: 345–359.

[16] Martin-Barreiro C, Cabezas X, Leiva V, Ramos de Santis P, Ramirez-Figueroa JA, Delgado E. Statistical characterization of vaccinated cases and deaths due to COVID-19: methodology and case study in South America. AIMS Mathematics. 2023; 8: 22693–22713.

[17] Groves-Kirkby N, Wakeman E, Patel S, Hinch R, Poot T, Pearson J, et al. Large-scale calibration and simulation of COVID-19 epidemiologic scenarios to support healthcare planning. Epidemics. 2023; 42: 100662.

[18] Soulsby CR, Hutchison C, Gardner J, Hart R, Sim MAB, Millar JE. Socio-economic deprivation and the risk of death after ICU admission with COVID-19: the poor relation. Journal of the Intensive Care Society. 2023; 24: 44–45.

[19] Zabor EC, Reddy CA, Tendulkar RD, Patil S. Logistic regression in clinical studies. International Journal of Radiation Oncology, Biology, Physics. 2022; 112: 271–277.

[20] Wallisch C, Bach P, Hafermann L, Klein N, Sauerbrei W, Steyerberg EW, et al. Review of guidance papers on regression modeling in statistical series of medical journals. PLOS ONE. 2022; 17: e0262918.

[21] Nusinovici S, Tham YC, Yan MY, Ting DS, Li J, Sabanayagam C, et al. Logistic regression was as good as machine learning for predicting major chronic diseases. Journal of Clinical Epidemiology. 2020; 122: 56–69.

[22] D’Agostino RB, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, et al. General cardiovascular risk profile for use in primary care: the Framingham heart study. Circulation. 2008; 117: 743–753.

[23] Alzahrani T, Nguyen T, Ryan A, Dwairy A, McCaffrey J, Yunus R, et al. Cardiovascular disease risk factors and myocardial infarction in the transgender population. Circulation: Cardiovascular Quality and Outcomes. 2019; 12: e005597.

[24] Zadeh LA. Outline of a new approach to the analysis of complex systems and decision processes. IEEE Transactions on Systems, Man, and Cybernetics. 1973; 3: 28–44.

[25] Tserkovny A. A t-norm fuzzy logic for approximate reasoning. Journal of Software Engineering and Applications. 2017; 10: 639–662.

[26] Czmil A. Comparative study of fuzzy rule-based classifiers for medical applications. Sensors. 2023; 23: 992–1012.

[27] Mohammadpour RA, Abedi SM, Bagheri S, Ghaemian A. Fuzzy rule-based classification system for assessing coronary artery disease. Computational and Mathematical Methods in Medicine. 2015; 2015: 64867.

[28] Gu X, Zhang C, Ni T. Feature selection and rule generation integrated learning for Takagi-Sugeno-Kang fuzzy system and its application in medical data classification. IEEE Access. 2019; 7: 1501–1508.

[29] Karthikeyan R, Geetha P, Ramaraj E. ‘Rule-based system for better prediction of diabetes’, 2019 3rd International Conference on Computing and Communications Technologies, 21–22 February 2019, Chennai, India. IEEE: New York, USA. 2019.

[30] Singh N, Singh P. ‘Medical diagnosis of coronary artery disease using fuzzy rule-based classification approach’, Advances in Biomedical Engineering and Technology: Select Proceedings of ICBEST 2018. 9–10 February 2018, Raipur, Chhattisgarh, India. Springer, Singapore. 2021.

[31] Hossain S, Sarma D, Chakma RJ, Alam W, Hoque MM, Sarker IH. ‘A rule-based expert system to assess coronary artery disease under uncertainty’, Proceedings of the First International Conference on Computing Science, Communication and Security. COMS2 2020. Gujarat, India, 26–27 March 2020. Springer, Singapore. 2020.

[32] Steimann F. On the use and usefulness of fuzzy sets in medical AI. Artificial Intelligence in Medicine. 2001; 21: 131–137.

[33] Mousavi SM, Abdullah S, Niaki STA, Banihashemi S. An intelligent hybrid classification algorithm integrating fuzzy rule-based extraction and harmony search optimization: medical diagnosis applications. Knowledge-Based Systems. 2021; 220: 106943.

[34] Rahman MZ, Akbar MA, Leiva V, Tahir A, Riaz MT, Martin-Barreiro C. An intelligent health monitoring and diagnosis system based on the internet of things and fuzzy logic for cardiac arrhythmia COVID-19 patients. Computers in Biology and Medicine. 2023; 154: 106583–106597.

[35] Saranya A, Rajeswari J. Enhanced prediction of student dropouts using fuzzy inference system and logistic regression. ICTACT Journal on Soft Computing. 2016; 6: 1157–1162.

[36] Zeinalnezhad M, Chofreh AG, Goni FA, Klemes JJ. Air pollution prediction using semi-experimental regression model and adaptive neuro-fuzzy inference system. Journal of Cleaner Production. 2020; 261: 121218.

[37] Aggarwal A, Chakradar M, Bhatia MS, Kumar M, Stephan T, Gupta SK, et al. COVID-19 risk prediction for diabetic patients using fuzzy inference system and machine learning approaches. Journal of Healthcare Engineering. 2022; 2022: 4096950.

[38] Vivekanandhan V, Sakthivel S, Manikandan M. Adaptive neuro fuzzy inference system to enhance the classification performance in smart irrigation system. Computational Intelligence. 2022; 38: 308–322.

[39] Tanaka H, Uejima S, Sai K. Linear regression analysis with fuzzy model. IEEE Transactions on Systems, Man, and Cybernetics. 1982; 12: 903–907.

[40] Palacios CA, Reyes-Suarez JA, Bearzotti LA, Leiva V, Marchant C. Knowledge discovery for higher education student retention based on data mining: machine learning algorithms and case study in Chile. Entropy. 2021; 23: 485.

[41] Butt NA, Awais MM, Abbas Q. Improved diagnostic accuracy in dependent personality disorders: a comparative study of neural architectures and hybrid approaches on functional magnetic resonance imaging data. Journal of Medical Imaging and Health Informatics. 2019; 9: 697–705.

[42] Singh D, Upadhyay R, Pannu HS, Leray D. Development of an adaptive neuro fuzzy inference system based vehicular traffic noise prediction model. Journal of Ambient Intelligence and Humanized Computing. 2020; 12: 2685–2701.

[43] Vapnik V. The nature of statistical learning theory. 2nd edn. Springer: New York, NY, USA. 2000.

[44] Cortes C, Vapnik V. Support-vector networks. Machine Learning. 1995; 20: 273–297.

[45] Breiman L. Random forests. Machine Learning. 2001; 45: 5–32.

[46] Breiman L, Cutler A., Random forests—machine learning. 2004. Available at: https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm (Accessed: 28 September 2024).

[47] Efron B, Tibshirani RJ. An introduction to the Bootstrap. 1st edn. CRC Press: Boca Raton, FL, USA. 1994.

[48] Friedman JH. Greedy function approximation: a gradient boosting machine. Annals of Statistics. 2001; 29: 1189–1232.

[49] Zadeh LA. Fuzzy sets. Information and Control. 1965; 8: 338–353.

[50] Sugeno M, Kang GT. Structure identification of fuzzy model. Fuzzy Sets and Systems. 1988; 28: 15–33.

[51] Lam HK. A review on stability analysis of continuous-time fuzzy-model-based control systems: From membership-function-independent to membership-function-dependent analysis. Engineering Applications of Artificial Intelligence. 2018; 67: 390–408.

[52] Riza LS, Bergmeir C, Herrera F, Benítez JM. frbs: fuzzy rule-based systems for classification and regression in R. Journal of Statistical Software. 2015; 65: 1–30.

[53] Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. 1st edn. Springer: New York, NY, USA. 2009.

[54] Kuhn M. Building predictive models in R using the caret package. Journal of Statistical Software. 2008; 28: 1–26.

[55] Sánchez L, Leiva V, Galea M, Saulo H. Birnbaum-Saunders quantile regression and its diagnostics with application to economic data. Applied Stochastic Models in Business and Industry. 2021; 37: 53–73.


Abstracted / indexed in

Science Citation Index Expanded (SciSearch) Created as SCI in 1964, Science Citation Index Expanded now indexes over 9,200 of the world’s most impactful journals across 178 scientific disciplines. More than 53 million records and 1.18 billion cited references date back from 1900 to present.

Journal Citation Reports/Science Edition Journal Citation Reports/Science Edition aims to evaluate a journal’s value from multiple perspectives including the journal impact factor, descriptive data about a journal’s open access content as well as contributing authors, and provide readers a transparent and publisher-neutral data & statistics information about the journal.

Chemical Abstracts Service Source Index The CAS Source Index (CASSI) Search Tool is an online resource that can quickly identify or confirm journal titles and abbreviations for publications indexed by CAS since 1907, including serial and non-serial scientific and technical publications.

Index Copernicus The Index Copernicus International (ICI) Journals database’s is an international indexation database of scientific journals. It covered international scientific journals which divided into general information, contents of individual issues, detailed bibliography (references) sections for every publication, as well as full texts of publications in the form of attached files (optional). For now, there are more than 58,000 scientific journals registered at ICI.

Geneva Foundation for Medical Education and Research The Geneva Foundation for Medical Education and Research (GFMER) is a non-profit organization established in 2002 and it works in close collaboration with the World Health Organization (WHO). The overall objectives of the Foundation are to promote and develop health education and research programs.

Scopus: CiteScore 1.3 (2023) Scopus is Elsevier's abstract and citation database launched in 2004. Scopus covers nearly 36,377 titles (22,794 active titles and 13,583 Inactive titles) from approximately 11,678 publishers, of which 34,346 are peer-reviewed journals in top-level subject fields: life sciences, social sciences, physical sciences and health sciences.

Embase Embase (often styled EMBASE for Excerpta Medica dataBASE), produced by Elsevier, is a biomedical and pharmacological database of published literature designed to support information managers and pharmacovigilance in complying with the regulatory requirements of a licensed drug.

Submission Turnaround Time

Conferences

Top