Volume - 7 | Issue - 4 | december 2025

Published
07 October, 2025
The most pivotal condition affecting human health is cardiac disease (CVD). Early detection of CVD can help prevent or mitigate its impact, potentially lowering mortality rates. Machine learning models are employed to identify CVD risk factors. To enhance CVD detection, we propose a robust framework by utilizing a variety of feature selection techniques to identify key predictive traits, using K-Fold cross-validation to prevent overfitting and model selection, and applying several novel ensemble classification methodologies. Real-time data were collected from a private hospital in Salem, and benchmark combined datasets were used for cardiovascular disease detection. A feature-type-based technique is used for handling missing values, and the Z-score technique is utilised for outlier handling. The SMOTE method is used to balance the imbalanced class. Three feature selection techniques, i.e., Pearson Correlation Coefficient, Recursive Feature Elimination, and Random Forest Feature Importance, are used to select the best attributes. Innovative ensemble classifiers like Bagging-Boosting Stacked Ensemble (BBSE), Heterogeneous Soft Voting Ensemble (HSVE), Feature-Augmented Heterogeneous Stacking (FAHS), Heterogeneous Bootstrap-Ensemble (HBE), and Heterogeneous Sequential Boosting (HSB) are created by combining multiple classifiers. The confusion matrix, accuracy, F1 score, recall, precision, and ROC were employed to measure performance. In a real-time medical dataset, the FAHS scored the highest accuracy of 92.18% without feature selection and the K-Fold CV methods. After applying the attribute selection methods and the K-fold CV approach, the FAHS model with the random forest feature importance technique scored the highest accuracy of 96.09%. In the benchmark dataset, FAHS scored the highest accuracy of 88.67% without feature selection and K-Fold CV. After applying the feature selection approaches and K-fold CV technique, the FAHS classifier with the random forest feature importance strategy scored the highest accuracy of 94.09%. Cardiovascular disease is a major global health problem, requiring correct and early detection. This study assesses different AI models, including FAHS, HSB and blended architectures, on a real- world medical dataset. The experimental output describes that the hybrid FAHS type exceeds traditional classifications, achieving 96.8%validity, 95.5% precision, 96.2% recall, and an AUC of 0.97. These findings illuminate the potential of ensemble learning frameworks to enhance predictive interpretability, accuracy, and scalability in CVD detection for practical healthcare implementation. In the real-time dataset, accuracy was improved from 92.18% to 96.09%. On the benchmark dataset, accuracy was improved from 88.67% to 94.09%. The random forest feature importance method with the FAHS combination scored the highest accuracy on both datasets. The outcomes are shown individually to provide comparisons. We may conclude from the outcome analysis that our suggested models provided the highest accuracy. In the future, these models will be very beneficial in detecting CVD with high accuracy.
KeywordsCardiovascular Disease Feature Selection K-Fold Cross-validation Feature-Augmented Stacking Sequential Boosting