Fake News Detection using DistilBERT Embeddings with PCA and Genetic Algorithm based Feature Selection
PDF
PDF

How to Cite

S., Suriya, and Samrrutha R S. 2025. “Fake News Detection Using DistilBERT Embeddings With PCA and Genetic Algorithm Based Feature Selection”. Journal of Ubiquitous Computing and Communication Technologies 7 (3): 240-56. https://doi.org/10.36548/jucct.2025.3.001.

Keywords

— Fake News Detection
— DistilBERT
— Principal Component Analysis (PCA)
— Genetic Algorithm
— Feature Selection
— Supervised Learning
Published: 12-09-2025

Abstract

The widespread dissemination of inaccurate information on digital platforms poses a threat to social trust, public safety, and democratic institutions. This work presents a novel and efficient model to mitigate the risk of identifying fake news that has three major components: context-aware text embeddings using DistilBERT, PCA for dimensionality reduction, and feature selection using a Genetic Algorithm (GA). The lightweight transformer model DistilBERT is utilized for the generation of 768-dimensional embeddings that provide deep contextual and semantic meaning of the text. To overcome the issues high-dimensional data poses regarding computational cost and overfitting, PCA is used to maintain 95% of data variance while utilizing significantly fewer features. For maximizing accuracy and model interpretability, an attribute selection procedure based on GA is subsequently utilized to select the most informative and discriminative attributes from the reduced feature space. This two-stage optimization (PCA followed by GA) is one of the paper's main contributions, distinguishing it from much of the prior work that primarily uses full embeddings or simple filters. For precision, a Logistic Regression classifier is employed for the final classification, even compromising on interpretability. The model attains a high accuracy of 98% when tested on a synthetically equalized set of fake reports. It also shows significant improvements in precision, recall, and F1-score when compared to other models. This system can identify fake news on various digital platforms in real time, quickly, and in scalable ways due to the combination of a high-quality language model, dimensionality reduction, and evolutionary optimization.

References

  1. Mridha, Muhammad Firoz, Ashfia Jannat Keya, Md Abdul Hamid, Muhammad Mostafa Monowar, and Md Saifur Rahman. "A comprehensive review on fake news detection with deep learning." IEEE access 9 (2021): 156151-156170.
  2. Shu, Kai, Limeng Cui, Suhang Wang, Dongwon Lee, and Huan Liu. "defend: Explainable fake news detection." In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, (2019): 395-405.
  3. Mishra, Shubha, Piyush Shukla, and Ratish Agarwal. "Analyzing machine learning enabled fake news detection techniques for diversified datasets." Wireless Communications and Mobile Computing 2022, no. 1 (2022): 1575365.
  4. Aslam, Nida, Irfan Ullah Khan, Farah Salem Alotaibi, Lama Abdulaziz Aldaej, and Asma Khaled Aldubaikil. "Fake detect: A deep learning ensemble model for fake news detection." complexity 2021, no. 1 (2021): 5557784.
  5. Agarwal, Aman, Mamta Mittal, Akshat Pathak, and Lalit Mohan Goyal. "Fake news detection using a blend of neural networks: An application of deep learning." SN Computer Science 1, no. 3 (2020): 143.
  6. Kaliyar, Rohit Kumar, Anurag Goswami, and Pratik Narang. "FakeBERT: Fake news detection in social media with a BERT-based deep learning approach." Multimedia tools and applications 80, no. 8 (2021): 11765-11788.
  7. Mosallanezhad, Ahmadreza, Mansooreh Karami, Kai Shu, Michelle V. Mancenido, and Huan Liu. "Domain adaptive fake news detection via reinforcement learning." In Proceedings of the ACM web conference (2022): 3632-3640
  8. Meesad, Phayung. "Thai fake news detection based on information retrieval, natural language processing and machine learning." SN Computer Science 2, no. 6 (2021): 425.
  9. Reis, Julio CS, André Correia, Fabrício Murai, Adriano Veloso, and Fabrício Benevenuto. "Supervised learning for fake news detection." IEEE Intelligent Systems 34, no. 2 (2019): 76-81.
  10. Choudhary, Anshika, and Anuja Arora. "Linguistic feature based learning model for fake news detection and classification." Expert Systems with Applications 169 (2021): 114171.
  11. Bahad, Pritika, Preeti Saxena, and Raj Kamal. "Fake news detection using bi-directional LSTM-recurrent neural network." Procedia Computer Science 165 (2019): 74-82.
  12. Sharma, Upasna, and Jaswinder Singh. "Review of feature extraction techniques for fake news detection." In Advances in Information Communication Technology and Computing: Proceedings of AICTC 2022, Singapore: Springer Nature Singapore, (2023): 389-399.
  13. Alghamdi, Jawaher, Suhuai Luo, and Yuqing Lin. "A comprehensive survey on machine learning approaches for fake news detection." Multimedia Tools and Applications 83, no. 17 (2024): 51009-51067.
  14. Probierz, Barbara, Piotr Stefański, and Jan Kozak. "Rapid detection of fake news based on machine learning methods." Procedia Computer Science 192 (2021): 2893-2902.
  15. Palani, Balasubramanian, Sivasankar Elango, and Vignesh Viswanathan K. "CB-Fake: A multimodal deep learning framework for automatic fake news detection using capsule neural network and BERT." Multimedia Tools and Applications 81, no. 4 (2022): 5587-5620.
  16. Suresh, S. "Transforming Fake News Detection: Leveraging DistilBERT Models for Enhanced Accuracy." Procedia Computer Science 260 (2025): 283-290.
  17. Chen, Yin, and Bingqi Yin. "Transformer-Based Fake News Classification: Evaluation of DistilBERT With CNN-LSTM and GloVe Embedding." Informatica 49, no. 25 (2025).
  18. Qazi, Aijazahamed, R. H. Goudar, Rudragoud Patil, Geetabai S. Hukkeri, and Dhanashree Kulkarni. "Leveraging BERT, DistilBERT and TinyBERT for rumor detection." IEEE Access (2025).
  19. Irfan, Kainat, Muhammad Wasim, Sehrash Safdar, Abdur Rehman, and Muhammad Usman Ghani. "XFND: Explainable Fake News Detection using a Hybrid DistillBERT and BiLSTM." In 2025 International Conference on Emerging Technologies in Electronics, Computing, and Communication (ICETECC), IEEE, (2025): 1-6.
  20. Nikitha, K. M., Ryan Rozario, Chinmayan Pradeep, and V. S. Ananthanarayana. "Fake News Detection Using Genetic Algorithm-Based Feature Selection and Ensemble Learning." In Advanced Machine Intelligence and Signal Processing, Singapore: Springer Nature Singapore, (2022): 365-377.
  21. AKLOUCHE, Billel, Adib RAHMANE, and Abdelhakim TAKAOUT. "Leveraging Pre-trained Transformer Models and Ensemble Learning for Fake News Detection: A Comparative Analysis." In 2024 International Conference on Advanced Aspects of Software Engineering (ICAASE), IEEE, (2024): 1-8.
  22. Chabukswar, Arati, and P. Deepa Shenoy. "A Hybrid DistilBERT-BiGRU Model for Enhanced Misinformation Detection: Leveraging Transformer-Based Pretraining Language Model." In 2024 IEEE Region 10 Symposium (TENSYMP), IEEE, (2024): 1-6.
  23. Oad, Ammar, Hamza Farooq, Amna Zafar, Beenish Ayesha Akram, Ruogu Zhou, and Feng Dong. "Fake news classification methodology with enhanced bert." IEEE Access (2024).
  24. Mewada, Arvind, Mohd Aquib Ansari, and Sushil Kumar Maurya. "From Misinformation to Truth: Fake News Detection with Transformer-Based Models." In 2025 IEEE 14th International Conference on Communication Systems and Network Technologies (CSNT), IEEE, (2025): 1321-1326.
  25. Choudhary, Murari, Shashank Jha, Deepika Saxena, and Ashutosh Kumar Singh. "A review of fake news detection methods using machine learning." In 2021 2nd international conference for emerging technology (INCET), IEEE, (2021): 1-5.