Volume - 7 | Issue - 3 | september 2025
Published
16 August, 2025
Cardiovascular disease (CVD) is the number one cause of death worldwide and highlights the need for reliable early detection models. In this study, we introduce an integrated machine learning framework that implements efficient data preprocessing, hybrid feature selection (through Chi-square, ANOVA F-test, RFE, and LassoCV), and class balancing using SMOTE, within a stacking ensemble classifier consisting of a Random Forest, XGBoost, LightGBM, MLP, and Logistic Regression classifiers. Our proposed model was evaluated on three unique datasets: an artificially generated large synthetic dataset; a merged public dataset; and actual hospital data from Indian hospitals. Each evaluation demonstrated high levels of performance, with accuracy measures approaching 98.86% and ROC AUC reaching as high as 99.9%. We efficiently addressed class imbalance, non-linear feature interaction and data heterogeneity, achieving excellent and generalizable predictive performance. Based on the findings from this work, ensemble-based hybrid methods demonstrated reliability and may be an efficient clinical decision support system for early detection of cardiovascular risk.
KeywordsCardiovascular Disease (CVD) Detection SMOTE Hybrid Feature Selection Ensemble Learning Stacking Ensemble