Abstract
Analyzing several datasets is essential to breast cancer research in order to find trends and prognostic markers. For this reason, the Wisconsin Prognostic Breast Cancer (WPBC) dataset offers a valuable source of data. Outliers, however, have the potential to seriously affect how accurate predictive models are. This work suggests using the Support Vector Machine (SVM) algorithm in an adaptive outlier removal method to improve the resilience of prediction models that were trained on the WPBC dataset. To ensure optimum SVM performance, the technique includes pre-processing processes, including addressing missing data and standardizing features. Tailored elimination of outliers is made possible by their dynamic identification, depending on how they deviate from the support of the SVM model. To increase generalization, the SVM is then retrained using the outlier-adjusted dataset. Test set evaluation shows the effectiveness of the method with improved F1-score, recall, and accuracy. With datasets similar to WPBC, this adaptive outlier elimination technique offers a useful tool for improving breast cancer prediction models, leading to increased model performance and dependability in prognostic tasks.
References
- Hoxha, Genc, Farid Melgani, and Jacopo Slaghenauffi. "A new CNN-RNN framework for remote sensing image captioning." In 2020 Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS), pp. 1-4. IEEE, 2020.
- Yu, Niange, Xiaolin Hu, Binheng Song, Jian Yang, and Jianwei Zhang. "Topic-oriented image captioning based on order-embedding." IEEE Transactions on Image Processing 28, no. 6 (2018): 2743-2754.
- Lo, Owen, William J. Buchanan, Paul Griffiths, and Richard Macfarlane. "Distance measurement methods for improved insider threat detection." Security and Communication Networks 2018 (2018): 1-18.
- Cegielski, Andrzej. "Bibliography on the Kaczmarz method." Journal of Mathematical Analysis and Applications 343 (2008): 427-435.
- " Safont, Gonzalo, Addisson Salazar, Luis Vergara, Enriqueta Gomez, and Vicente Villanueva. "Probabilistic distance for mixtures of independent component analyzers." IEEE Transactions on Neural Networks and Learning Systems 29, no. 4 (2017): 1161-1173.
- Walker, Shalika, Waqas Khan, Katarina Katic, Wim Maassen, and Wim Zeiler. "Accuracy of different machine learning algorithms and added-value of predicting aggregated-level energy performance of commercial buildings." Energy and Buildings 209 (2020): 109705.
- Yin, Shizhuang, and Tao Wang. "An unknown Protocol improved k-means clustering algorithm based on Pearson distance." Journal of Intelligent & Fuzzy Systems 38, no. 4 (2020): 4901-4913.
- Shrifan, Nawaf HMM, Ghassan Nihad Jawad, Nor Ashidi Mat Isa, and Muhammad Firdaus Akbar. "Microwave nondestructive testing for defect detection in composites based on K-means clustering algorithm." IEEE Access 9 (2020): 4820-4828.
- "Evolutionary static and dynamic clustering methods based on multi-verse optimizer," J. Chen and H. Zhuge, "2019 15th International Conference on Semantics, Knowledge and Grids (SKG), Guangzhou, China, 2020, pp. 123-126, doi: 10.1109/SKG49510.2019.00029.
- " Zhang, Mingxing, Yang Yang, Hanwang Zhang, Yanli Ji, Heng Tao Shen, and Tat-Seng Chua. "More is better: Precise and detailed image captioning using online positive recall and missing concepts mining." IEEE Transactions on Image Processing 28, no. 1 (2018): 32-44.
