An Innovative Machine Learning Framework for Cardiovascular Disease Detection
PDF
PDF

How to Cite

T., Jayasudha, and Uma Rani R. 2025. “An Innovative Machine Learning Framework for Cardiovascular Disease Detection”. Journal of Innovative Image Processing 7 (4): 1078-1107. https://doi.org/10.36548/jiip.2025.4.001.

Keywords

  • Cardiovascular Disease
  • Feature Selection
  • K-Fold Cross-validation
  • Feature-Augmented Stacking
  • Sequential Boosting

Abstract

The most pivotal condition affecting human health is cardiac disease (CVD). Early detection of CVD can help prevent or mitigate its impact, potentially lowering mortality rates. Machine learning models are employed to identify CVD risk factors. To enhance CVD detection, we propose a robust framework by utilizing a variety of feature selection techniques to identify key predictive traits, using K-Fold cross-validation to prevent overfitting and model selection, and applying several novel ensemble classification methodologies. Real-time data were collected from a private hospital in Salem, and benchmark combined datasets were used for cardiovascular disease detection. A feature-type-based technique is used for handling missing values, and the Z-score technique is utilised for outlier handling. The SMOTE method is used to balance the imbalanced class. Three feature selection techniques, i.e., Pearson Correlation Coefficient, Recursive Feature Elimination, and Random Forest Feature Importance, are used to select the best attributes. Innovative ensemble classifiers like Bagging-Boosting Stacked Ensemble (BBSE), Heterogeneous Soft Voting Ensemble (HSVE), Feature-Augmented Heterogeneous Stacking (FAHS), Heterogeneous Bootstrap-Ensemble (HBE), and Heterogeneous Sequential Boosting (HSB) are created by combining multiple classifiers. The confusion matrix, accuracy, F1 score, recall, precision, and ROC were employed to measure performance. In a real-time medical dataset, the FAHS scored the highest accuracy of 92.18% without feature selection and the K-Fold CV methods. After applying the attribute selection methods and the K-fold CV approach, the FAHS model with the random forest feature importance technique scored the highest accuracy of 96.09%. In the benchmark dataset, FAHS scored the highest accuracy of 88.67% without feature selection and K-Fold CV. After applying the feature selection approaches and K-fold CV technique, the FAHS classifier with the random forest feature importance strategy scored the highest accuracy of 94.09%. Cardiovascular disease is a major global health problem, requiring correct and early detection. This study assesses different AI models, including FAHS, HSB and blended architectures, on a real- world medical dataset. The experimental output describes that the hybrid FAHS type exceeds traditional classifications, achieving 96.8%validity, 95.5% precision, 96.2% recall, and an AUC of 0.97. These findings illuminate the potential of ensemble learning frameworks to enhance predictive interpretability, accuracy, and scalability in CVD detection for practical healthcare implementation. In the real-time dataset, accuracy was improved from 92.18% to 96.09%. On the benchmark dataset, accuracy was improved from 88.67% to 94.09%. The random forest feature importance method with the FAHS combination scored the highest accuracy on both datasets. The outcomes are shown individually to provide comparisons. We may conclude from the outcome analysis that our suggested models provided the highest accuracy. In the future, these models will be very beneficial in detecting CVD with high accuracy.

References

Trevisan, Caterina, Giuseppe Sergi, and Stefania Maggi. "Gender differences in brain-heart connection." In Brain and heart dynamics, Cham: Springer International Publishing, (2020): 937-951.

M. S. Oh and M. H. Jeong, ‘‘Sex differences in cardiovascular disease risk factors among Korean adults,’’ Korean J. Med., vol. 95, no. 4, Aug. (2020): 266–275. http://doi.org/10.3904/kjm.2020.95.4.266

World Health Organization, and J. Dostupno. "‘Cardiovascular diseases: Key facts." vol 13 (2016): 6.

Uyar, Kaan, and Ahmet İlhan. "Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks." Procedia computer science 120 (2017): 588-593.

Ayon, Safial Islam, Md Milon Islam, and Md Rahat Hossain. "Coronary artery heart disease prediction: a comparative study of computational intelligence techniques." IETE Journal of Research 68, no. 4 (2022): 2488-2507.

Srivastava, Keshav, and Dilip Kumar Choubey. "Heart disease prediction using machine learning and data mining." International Journal of Recent Technology and Engineering 9, no. 1 (2020): 212-219.

Ang, Jun Chin, Andri Mirzal, Habibollah Haron, and Haza Nuzly Abdull Hamed. "Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection." IEEE/ACM transactions on computational biology and bioinformatics 13, no. 5 (2015): 971-989.

Aggrawal, Ritu, and Saurabh Pal. "Sequential feature selection and machine learning algorithm-based patient’s death events prediction and diagnosis in heart disease." SN Computer Science 1, no. 6 (2020): 344.

Alam, Md Mahbub, Swapnil Saha, Proshib Saha, Fernaz Narin Nur, Nazmun Nessa Moon, Asif Karim, and Sami Azam. "D-care: A non-invasive glucose measuring technique for monitoring diabetes patients." In Proceedings of International Joint Conference on Computational Intelligence: IJCCI 2018, Singapore: Springer Nature Singapore, (2019): 443-453.

Mienye, Ibomoiye Domor, Yanxia Sun, and Zenghui Wang. "An improved ensemble learning approach for the prediction of heart disease risk." Informatics in Medicine Unlocked 20 (2020): 100402.

Wang, Haolin, Zhilin Huang, Danfeng Zhang, Johan Arief, Tiewei Lyu, and Jie Tian. "Integrating co-clustering and interpretable machine learning for the prediction of intravenous immunoglobulin resistance in kawasaki disease." Ieee Access 8 (2020): 97064-97071.

Tama, Bayu Adhi, Sun Im, and Seungchul Lee. "Improving an intelligent detection system for coronary heart disease using a two‐tier classifier ensemble." BioMed Research International 2020, no. 1 (2020): 9816142.

Mishra, Jyoti, and Sandhya Tarar. "Chronic disease prediction using deep learning." In International Conference on Advances in Computing and Data Sciences, Singapore: Springer Singapore, (2020): 201-211.

Spencer, Robinson, Fadi Thabtah, Neda Abdelhamid, and Michael Thompson. "Exploring feature selection and classification methods for predicting heart disease." Digital health 6 (2020): 2055207620914777.

Takci, Hidayet. "Improvement of heart attack prediction by the feature selection methods." Turkish Journal of Electrical Engineering and Computer Sciences 26, no. 1 (2018): 1-10.

Qiu, Zhaobin, Ying Qiao, Wanyuan Shi, and Xiaoqian Liu. "A robust framework for enhancing cardiovascular disease risk prediction using an optimized category boosting model." Mathematical Biosciences and Engineering 21, no. 2 (2024): 2943-2969.

Pathan, Muhammad Salman, Avishek Nag, Muhammad Mohisn Pathan, and Soumyabrata Dev. "Analyzing the impact of feature selection on the accuracy of heart disease prediction." Healthcare Analytics 2 (2022): 100060.

Osei-Nkwantabisa, Akua Sekyiwaa, and Redeemer Ntumy. "Classification and Prediction of Heart Diseases using Machine Learning Algorithms." arXiv preprint arXiv:2409.03697 (2024).

Snigdha Datta, “Robust Cardiovascular Disease Prediction Using Logistic Regression”, The Journal of Management and Engineering Integration Vol. 14, No. 1 | Summer 2021

Abdar, M. “Using Decision Trees in Data Mining for Predicting Factors Influencing of Heart Disease”, Carpathian Journal of Electronic and Computer Engineering, Volume 8, Issue 2, pages 31–36.

Latha, C. Beulah Christalin, and S. Carolin Jeeva. "Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques." Informatics in Medicine Unlocked 16 (2019): 100203.

Majumder, Annwesha Banerjee, Somsubhra Gupta, and Dharmpal Singh. "An Ensemble Heart Disease Prediction Model Bagged with Logistic Regression, Naïve Bayes and K Nearest Neighbour." In Journal of Physics: Conference Series, vol. 2286, no. 1, p. 012017. IOP Publishing, 2022.

Zhang, Jingyi, Huolan Zhu, Yongkai Chen, Chenguang Yang, Huimin Cheng, Yi Li, Wenxuan Zhong, and Fang Wang. "Ensemble machine learning approach for screening of coronary heart disease based on echocardiography and risk factors." BMC Medical Informatics and Decision Making 21, no. 1 (2021): 187.

Tomar, Divya, and Sonali Agarwal. "Feature selection based least square twin support vector machine for diagnosis of heart disease." International Journal of Bio-Science and Bio-Technology 6, no. 2 (2014): 69-82.

Dwi Normawati, Dewi Pramudi Ismi,” K-Fold Cross Validation for Selection of Cardiovascular Disease Diagnosis Features by Applying Rule-Based Data Mining”, Signal and Image Processing Letters Vol. 1, No. 2, July (2019): 22-32, ISSN 2714-6677, https://simple.ascee.org/