Indian Machinery and Transport Equipment Exports - Forecasting with External Factors Using Chain of Hybrid Sarimax-Garch Model
Volume-5 | Issue-2

Enhancing Road Safety: A Driver Fatigue Detection and Behaviour Monitoring System using Advanced Computer Vision Techniques
Volume-6 | Issue-2

Green Lights Ahead: An IoT Solution for Prioritizing Emergency Vehicles
Volume-5 | Issue-3

Predictive Analytics with Data Visualization
Volume-4 | Issue-2

Smart Farming: Enhancing Network Infrastructure for Agricultural Sustainability
Volume-6 | Issue-1

Comparison of Stock Price Prediction Models using Pre-trained Neural Networks
Volume-3 | Issue-2

Efficient Two Stage Identification for Face mask detection using Multiclass Deep Learning Approach
Volume-3 | Issue-2

Design an Adaptive Hybrid Approach for Genetic Algorithm to Detect Effective Malware Detection in Android Division
Volume-3 | Issue-2

Blockchain Framework for Communication between Vehicle through IoT Devices and Sensors
Volume-3 | Issue-2

Split-Capacitor Five-Level Transformerless Grid Connected Single Phase PV System using Level Shifted PWM Technique
Volume-4 | Issue-1

Gas Leakage Detection in Pipeline by SVM classifier with Automatic Eddy Current based Defect Recognition Method
Volume-3 | Issue-3

Design an Adaptive Hybrid Approach for Genetic Algorithm to Detect Effective Malware Detection in Android Division
Volume-3 | Issue-2

Comparison of Stock Price Prediction Models using Pre-trained Neural Networks
Volume-3 | Issue-2

Construction of a Framework for Selecting an Effective Learning Procedure in the School-Level Sector of Online Teaching Informatics
Volume-3 | Issue-4

Machine Learning Algorithms Performance Analysis for VLSI IC Design
Volume-3 | Issue-2

Efficient Two Stage Identification for Face mask detection using Multiclass Deep Learning Approach
Volume-3 | Issue-2

Characterizing WDT subsystem of a Wi-Fi controller in an Automobile based on MIPS32 CPU platform across PVT
Volume-2 | Issue-4

Assimilation of IoT sensors for Data Visualization in a Smart Campus Environment
Volume-3 | Issue-4

Design of Data Mining Techniques for Online Blood Bank Management by CNN Model
Volume-3 | Issue-3

Ethereum and IOTA based Battery Management System with Internet of Vehicles
Volume-3 | Issue-3

Home / Archives / Volume-7 / Issue-3 / Article-1

Volume - 7 | Issue - 3 | september 2025

Fake News Detection using DistilBERT Embeddings with PCA and Genetic Algorithm based Feature Selection Open Access
Suriya S.  , Samrrutha R S.  82
Pages: 240-256
Cite this article
S., Suriya, and Samrrutha R S.. "Fake News Detection using DistilBERT Embeddings with PCA and Genetic Algorithm based Feature Selection." Journal of Ubiquitous Computing and Communication Technologies 7, no. 3 (2025): 240-256
Published
12 September, 2025
Abstract

The widespread dissemination of inaccurate information on digital platforms poses a threat to social trust, public safety, and democratic institutions. This work presents a novel and efficient model to mitigate the risk of identifying fake news that has three major components: context-aware text embeddings using DistilBERT, PCA for dimensionality reduction, and feature selection using a Genetic Algorithm (GA). The lightweight transformer model DistilBERT is utilized for the generation of 768-dimensional embeddings that provide deep contextual and semantic meaning of the text. To overcome the issues high-dimensional data poses regarding computational cost and overfitting, PCA is used to maintain 95% of data variance while utilizing significantly fewer features. For maximizing accuracy and model interpretability, an attribute selection procedure based on GA is subsequently utilized to select the most informative and discriminative attributes from the reduced feature space. This two-stage optimization (PCA followed by GA) is one of the paper's main contributions, distinguishing it from much of the prior work that primarily uses full embeddings or simple filters. For precision, a Logistic Regression classifier is employed for the final classification, even compromising on interpretability. The model attains a high accuracy of 98% when tested on a synthetically equalized set of fake reports. It also shows significant improvements in precision, recall, and F1-score when compared to other models. This system can identify fake news on various digital platforms in real time, quickly, and in scalable ways due to the combination of a high-quality language model, dimensionality reduction, and evolutionary optimization.

Keywords

Fake News Detection DistilBERT Principal Component Analysis (PCA) Genetic Algorithm Feature Selection Supervised Learning

×

Currently, subscription is the only source of revenue. The subscription resource covers the operating expenses such as web presence, online version, pre-press preparations, and staff wages.

To access the full PDF, please complete the payment process.

Subscription Details

Category Fee
Article Access Charge
15 USD
Open Access Fee Nil
Annual Subscription Fee
200 USD
After payment,
please send an email to irojournals.contact@gmail.com / journals@iroglobal.com requesting article access.
Subscription form: click here