Abstract
Microarray gene expression is a technique used to monitor the expression of thousands of genes under various conditions. Clustering, an unsupervised learning technique, is employed to classify or identify similar genes by grouping sets of data objects into subclasses. This approach reveals patterns that may be obscured within extensive gene datasets and complex biological networks. Processing large-dimensional genomic datasets presents inherent complexities. To address this, the proposed method reduces the dimensionality of microarray gene datasets through a combination of feature selection and feature projection, thereby enhancing the performance of clustering algorithms. The gene datasets are processed using the Python programming language, and the output is the accuracy percentage of the validated clusters. This method has been validated using several standard datasets.
References
- Deshp, Yash, and Andrea Montanari. "Sparse PCA via covariance thresholding." Journal of Machine Learning Research 17, no. 141 (2016): 1-41.
- Chen, Yudong, and Jiaming Xu. "Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices." arXiv preprint arXiv:1402.1267 (2014).
- Liu, Wenhao, Junjun Zhai, Hongwei Ding, and Xinlong He. "The research of algorithm for protein subcellular localization prediction based on SVM-RFE." In 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), IEEE, 2017, 1-6.
- An, Wenjuan, Mangui Liang, and He Liu. "An improved one-class support vector machine classifier for outlier detection." Proceedings of the institution of mechanical engineers, part c: Journal of mechanical engineering science 229, no. 3 (2015): 580-588.
- Yongdong, Fan. "A Summary of Cross-Validation in Model Selection." Shanxi University (2013): 31-35.
- Zhang, S., T. Zhang, and C. Liu. "Prediction of apoptosis protein subcellular localization via heterogeneous features and hierarchical extreme learning machine." SAR and QSAR in Environmental Research 30, no. 3 (2019): 209-228.
- Chira, Camelia, Javier Sedano, José R. Villar, Monica Camara, and Carlos Prieto. "Shape-output gene clustering for time series microarrays." In 10th International Conference on Soft Computing Models in Industrial and Environmental Applications, pp. 241-250. Springer International Publishing, 2015.
- Peng, Peter, Omer Addam, Mohamad Elzohbi, Sibel T. Özyer, Ahmad Elhajj, Shang Gao, Yimin Liu et al. "Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data." Knowledge-Based Systems 56 (2014): 108-122.
- Alkhateeb, Abed, Iman Rezaeian, Siva Singireddy, and Luis Rueda. "Obtaining biomarkers in cancer progression from outliers of time-series clusters." In 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, 2015, 889-896.
- Kuo, Ren-Jieh, Y. D. Huang, Chih-Chieh Lin, Yung-Hung Wu, and Ferani E. Zulvia. "Automatic kernel clustering with bee colony optimization algorithm." Information Sciences 283 (2014): 107-122.
- Aldryan, D. P., and Aditsania Annisa. "Cancer detection based on microarray data classification with ant colony optimization and modified backpropagation conjugate gradient Polak-Ribiére." In 2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA), IEEE, 2018, 13-16.
- Valdiviezo-Diaz, Priscila. "Partitional clustering based on PCA method for segmentation of products." In 2021 16th Iberian Conference on Information Systems and Technologies (CISTI), IEEE, 2021, 1-4.
- Li, Ang, Jingqi Fu, Huaming Shen, and Sizhou Sun. "A cluster-principal-component-analysis-based indoor positioning algorithm." IEEE Internet of Things Journal 8, no. 1 (2020): 187-196.
- Huang, Jiale, Jingtong Dai, and Yanjin Li. "Research on PCA-Kmeans++ clustering algorithm considering Spatiotemporal dimension." In 2023 2nd International Conference on 3D Immersion, Interaction and Multi-sensory Experiences (ICDIIME), IEEE, 2023, 195-201.
- Hermiati, Arya Syifa, Rudy Herteno, Fatma Indriani, Triando Hamonangan Saragih, and Triwiyanto Triwiyanto. "A Comparative Study: Application of Principal Component Analysis and Recursive Feature Elimination in Machine Learning for Stroke Prediction." Journal of Electronics, Electromedical Engineering, and Medical Informatics 6, no. 3 (2024): 231-242.
