Enhancing Cardiovascular Disease Detection with SMOTE-Boosted Stacking Ensembles and Hybrid Feature Selection
PDF
PDF

How to Cite

Shilpa, Thoutireddy, and Priyanka T. 2025. “Enhancing Cardiovascular Disease Detection With SMOTE-Boosted Stacking Ensembles and Hybrid Feature Selection”. Journal of Innovative Image Processing 7 (3): 639-58. https://doi.org/10.36548/jiip.2025.3.004.

Keywords

  • Cardiovascular Disease (CVD) Detection
  • SMOTE
  • Hybrid Feature Selection
  • Ensemble Learning
  • Stacking Ensemble

Abstract

Cardiovascular disease (CVD) is the number one cause of death worldwide and highlights the need for reliable early detection models. In this study, we introduce an integrated machine learning framework that implements efficient data preprocessing, hybrid feature selection (through Chi-square, ANOVA F-test, RFE, and LassoCV), and class balancing using SMOTE, within a stacking ensemble classifier consisting of a Random Forest, XGBoost, LightGBM, MLP, and Logistic Regression classifiers. Our proposed model was evaluated on three unique datasets: an artificially generated large synthetic dataset; a merged public dataset; and actual hospital data from Indian hospitals. Each evaluation demonstrated high levels of performance, with accuracy measures approaching 98.86% and ROC AUC reaching as high as 99.9%. We efficiently addressed class imbalance, non-linear feature interaction and data heterogeneity, achieving excellent and generalizable predictive performance. Based on the findings from this work, ensemble-based hybrid methods demonstrated reliability and may be an efficient clinical decision support system for early detection of cardiovascular risk.

References

Ahmad, Bilal, Jinfu Chen, and Haibao Chen. "Feature selection strategies for optimized heart disease diagnosis using ML and DL models." arXiv preprint arXiv:2503.16577 (2025).

Lübeck, Frederike, Jonas Wildberger, Frederik Träuble, Maximilian Mordig, Sergios Gatidis, Andreas Krause, and Bernhard Schölkopf. "Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models." arXiv preprint arXiv:2505.24655 (2025).

Liu, Minyu, Yuxiong Pan, Ziyong Wang, Jvhong Wang, Yibao Shi, and Jun Chu. "The role of social determinants in alcohol consumption and cardiovascular health: the pathways study." Nutrition, Metabolism and Cardiovascular Diseases 35, no. 5 (2025): 103783.

Balada, Christoph, Aida Romano-Martinez, Vincent ten Cate, Katharina Geschke, Jonas Tesarz, Paul Claßen, Alexander K. Schuster et al. "Deep Learning for Cardiovascular Risk Assessment: Proxy Features from Carotid Sonography as Predictors of Arterial Damage." In Annual Conference on Medical Image Understanding and Analysis, pp. 251-265. Cham: Springer Nature Switzerland, 2025.

Liu, Tianyi, Andrew Krentz, Lei Lu, and Vasa Curcin. "Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis." European Heart Journal-Digital Health 6, no. 1 (2025): 7-22.

Sianga, Bernada E., Maurice C. Mbago, and Amina S. Msengwa. "Predicting the prevalence of cardiovascular diseases using machine learning algorithms." Intelligence-Based Medicine 11 (2025): 100199.

Saikumar, K., P. S. Ravindra, M. D. Sravanthi, Abolfazl Mehbodniya, J. L. Webber, and Ali Bostani. "Heart disease prediction using machine learning and deep learning approaches: a systematic survey." Heart Dis 35, no. 2s (2025): 2398.

Alkayyali, Z. K., S. Anuar Bin Idris, and Samy S. Abu-Naser. "A systematic literature review of deep and machine learning algorithms in cardiovascular diseases diagnosis." Journal of Theoretical and Applied Information Technology 101, no. 4 (2023): 1353-1365.

Dritsas, Elias, Sotiris Alexiou, and Konstantinos Moustakas. "Cardiovascular Disease Risk Prediction with Supervised Machine Learning Techniques." ICT4AWE 1 (2022): 315-321.

Trigka, Maria, and Elias Dritsas. "Improving Cardiovascular Disease Prediction With Deep Learning and Correlation-Aware SMOTE." IEEE Access (2025).

Rattan, Vikas, Ruchi Mittal, Jaiteg Singh, and Varun Malik. "Analyzing the application of SMOTE on machine learning classifiers." In 2021 International Conference on Emerging Smart Computing and Informatics (ESCI), IEEE, 2021, 692-695.

de Miguel-Diez, Javier, Julio Nunez Villota, Salud Santos Perez, Nicolas Manito Lorite, Bernardino Alcazar Navarrete, Juan Francisco Delgado Jimenez, Juan Jose Soler-Cataluna, Domingo Pascual Figal, Patricia Sobradillo Ecenarro, and Juan Jose Gomez Doblas. "Multidisciplinary management of patients with chronic obstructive pulmonary disease and cardiovascular disease." Archivos de Bronconeumología 60, no. 4 (2024): 226-237.

Sharma, Narendra Kumar, Alok Singh Chauhan, Shahnaz Fatima, and Swati Saxena. "Enhancing heart disease diagnosis: Leveraging classification and ensemble machine learning techniques in healthcare decision-making." Journal of Integrated Science and Technology 13, no. 1 (2025): 1016-1016.

auya, Jannatul, Saad Sahriar, Sanjida Akther, Ruhul Amin, Sabba Ruhi, and Md Shamim Reza. "Missing risk factor prediction in cardiovascular disease using a blended dataset and optimizing classification with a stacking algorithm." Engineering Reports 7, no. 1 (2025): e13034.

Yang, Jian, and Jinhan Guan. "A heart disease prediction model based on feature optimization and smote-Xgboost algorithm." Information 13, no. 10 (2022): 475.

de la Brassinne Bonardeaux, Orianne, Manon Deneye, Cecile Oury, Marie Moonen, and Patrizio Lancellotti. "High-Sensitivity CRP and Occurrence of Cancer in Cardiovascular Disease Patients with Cardiovascular." Journal of Clinical Medicine 14, no. 4 (2025): 1193.

Talaat, Fatma M. "Revolutionizing cardiovascular health: integrating deep learning techniques for predictive analysis of personal key indicators in heart disease." Neural Computing and Applications 37.1 (2025): 1-24.

Cao, Xiyu, Jianli Ma, Xiaoyi He, Yufei Liu, Yang Yang, Yaqi Wang, and Chuantao Zhang. "Unlocking the link: predicting cardiovascular disease risk with a focus on airflow obstruction using machine learning." BMC Medical Informatics and Decision Making 25, no. 1 (2025): 50.

Tian, Jing, et al. "Association between estimated glucose disposal rate and prediction of cardiovascular disease risk among individuals with cardiovascular-kidney-metabolic syndrome stage 0–3: a nationwide prospective cohort study." Diabetology & Metabolic Syndrome 17.1 (2025): 58.

Bai, Tiantian, et al. "Exploration and comparison of the effectiveness of swarm intelligence algorithm in early identification of cardiovascular disease." Scientific Reports 15.1 (2025): 4647.

Ganie, Shahid Mohammad, Pijush Kanti Dutta Pramanik, and Zhongming Zhao. "Ensemble learning with explainable AI for improved heart disease prediction based on multiple datasets." Scientific reports 15.1 (2025): 13912.

Mittal, Pooja, et al. "Advanced Hybrid Machine Learning Model for Accurate Detection of Cardiovascular Disease." International Journal of Computational Intelligence Systems 18.1 (2025): 1-20.

Xia, Biao, et al. "Intelligent cardiovascular disease diagnosis using deep learning enhanced neural network with ant colony optimization." Scientific Reports 14.1 (2024): 21777.

Dorraki, Mohsen, et al. "Improving cardiovascular disease prediction with machine learning using mental health data: a prospective UK Biobank study." JACC: Advances 3.9_Part_2 (2024): 101180.

Zheng, Dongze, et al. "The association of triglyceride-glucose index and combined obesity indicators with chest pain and risk of cardiovascular disease in American population with pre-diabetes or diabetes." Frontiers in Endocrinology 15 (2024): 1471535.

Asadi, Fariba, et al. "Detection of cardiovascular disease cases using advanced tree-based machine learning algorithms." Scientific Reports 14.1 (2024): 22230.

World Health Organization (WHO).[https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)].

Centers for Disease Control and Prevention (CDC).[https://www.cdc.gov/heartdisease/facts.htm].

Dataset1:https://www.kaggle.com/datasets/mahatiratusher/heart-disease-risk-prediction-dataset

Dataset2: https://www.kaggle.com/datasets/mfarhaannazirkhan/heart-dataset/data

Dataset3: https://data.mendeley.com/datasets/dzz48mvjht/1.