Fake News Detection using DistilBERT Embeddings with PCA and Genetic Algorithm based Feature Selection

Suriya S.; Samrrutha R S.

Indian Machinery and Transport Equipment Exports - Forecasting with External Factors Using Chain of Hybrid Sarimax-Garch Model
Volume-5 | Issue-2

Enhancing Road Safety: A Driver Fatigue Detection and Behaviour Monitoring System using Advanced Computer Vision Techniques
Volume-6 | Issue-2

Predictive Analytics with Data Visualization
Volume-4 | Issue-2

Green Lights Ahead: An IoT Solution for Prioritizing Emergency Vehicles
Volume-5 | Issue-3

Smart Farming: Enhancing Network Infrastructure for Agricultural Sustainability
Volume-6 | Issue-1

Design and Implementation of MPPT based Solar Powered Wireless Battery Charger
Volume-4 | Issue-1

Automated Learning and Scheduling Assistant using LLM
Volume-6 | Issue-3

Prediction on Crop Yield on Indian based Agriculture using Machine Learning
Volume-7 | Issue-2

Split-Capacitor Five-Level Transformerless Grid Connected Single Phase PV System using Level Shifted PWM Technique
Volume-4 | Issue-1

Comparison of Stock Price Prediction Models using Pre-trained Neural Networks
Volume-3 | Issue-2

Gas Leakage Detection in Pipeline by SVM classifier with Automatic Eddy Current based Defect Recognition Method
Volume-3 | Issue-3

Design an Adaptive Hybrid Approach for Genetic Algorithm to Detect Effective Malware Detection in Android Division
Volume-3 | Issue-2

Comparison of Stock Price Prediction Models using Pre-trained Neural Networks
Volume-3 | Issue-2

Construction of a Framework for Selecting an Effective Learning Procedure in the School-Level Sector of Online Teaching Informatics
Volume-3 | Issue-4

Machine Learning Algorithms Performance Analysis for VLSI IC Design
Volume-3 | Issue-2

Efficient Two Stage Identification for Face mask detection using Multiclass Deep Learning Approach
Volume-3 | Issue-2

Characterizing WDT subsystem of a Wi-Fi controller in an Automobile based on MIPS32 CPU platform across PVT
Volume-2 | Issue-4

Assimilation of IoT sensors for Data Visualization in a Smart Campus Environment
Volume-3 | Issue-4

Design of Data Mining Techniques for Online Blood Bank Management by CNN Model
Volume-3 | Issue-3

Ethereum and IOTA based Battery Management System with Internet of Vehicles
Volume-3 | Issue-3

Home / Archives / Volume-7 / Issue-3 / Article-1

Volume - 7 | Issue - 3 | september 2025

Fake News Detection using DistilBERT Embeddings with PCA and Genetic Algorithm based Feature Selection Open Access

Suriya S. , Samrrutha R S. 364

Pages: 240-256

Full Article PDF

Cite this article

S., Suriya, and Samrrutha R S.. "Fake News Detection using DistilBERT Embeddings with PCA and Genetic Algorithm based Feature Selection." Journal of Ubiquitous Computing and Communication Technologies 7, no. 3 (2025): 240-256

DOI

10.36548/jucct.2025.3.001

Published

12 September, 2025

Abstract

The widespread dissemination of inaccurate information on digital platforms poses a threat to social trust, public safety, and democratic institutions. This work presents a novel and efficient model to mitigate the risk of identifying fake news that has three major components: context-aware text embeddings using DistilBERT, PCA for dimensionality reduction, and feature selection using a Genetic Algorithm (GA). The lightweight transformer model DistilBERT is utilized for the generation of 768-dimensional embeddings that provide deep contextual and semantic meaning of the text. To overcome the issues high-dimensional data poses regarding computational cost and overfitting, PCA is used to maintain 95% of data variance while utilizing significantly fewer features. For maximizing accuracy and model interpretability, an attribute selection procedure based on GA is subsequently utilized to select the most informative and discriminative attributes from the reduced feature space. This two-stage optimization (PCA followed by GA) is one of the paper's main contributions, distinguishing it from much of the prior work that primarily uses full embeddings or simple filters. For precision, a Logistic Regression classifier is employed for the final classification, even compromising on interpretability. The model attains a high accuracy of 98% when tested on a synthetically equalized set of fake reports. It also shows significant improvements in precision, recall, and F1-score when compared to other models. This system can identify fake news on various digital platforms in real time, quickly, and in scalable ways due to the combination of a high-quality language model, dimensionality reduction, and evolutionary optimization.

Keywords

Fake News Detection DistilBERT Principal Component Analysis (PCA) Genetic Algorithm Feature Selection Supervised Learning

Category	Fee
Article Access Charge	15 USD
Open Access Fee	Nil
Annual Subscription Fee	200 USD

Volume - 7 | Issue - 3 | september 2025

Suriya S.

DOI

10.36548/jucct.2025.3.001

Published

12 September, 2025

e-ISSN: 2582-337X
4 issues per year
DOI: https://doi.org/10.36548/jucct

Indexing
GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher

Inventive Research Organization

Publication Charges: Nil

Most Accessed Articles

Most Downloaded Articles

Volume - 7 | Issue - 3 | september 2025

Suriya S.

DOI

10.36548/jucct.2025.3.001

Published

12 September, 2025

e-ISSN: 2582-337X 4 issues per year DOI: https://doi.org/10.36548/jucct

Indexing GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher Inventive Research Organization

Publication Charges: Nil

e-ISSN: 2582-337X
4 issues per year
DOI: https://doi.org/10.36548/jucct

Indexing
GoogleScholar | Crossref | MicrosoftAcademic | ScienceGate | J-Gate

Publisher

Inventive Research Organization