Volume - 7 | Issue - 3 | september 2025
Published
12 September, 2025
The widespread dissemination of inaccurate information on digital platforms poses a threat to social trust, public safety, and democratic institutions. This work presents a novel and efficient model to mitigate the risk of identifying fake news that has three major components: context-aware text embeddings using DistilBERT, PCA for dimensionality reduction, and feature selection using a Genetic Algorithm (GA). The lightweight transformer model DistilBERT is utilized for the generation of 768-dimensional embeddings that provide deep contextual and semantic meaning of the text. To overcome the issues high-dimensional data poses regarding computational cost and overfitting, PCA is used to maintain 95% of data variance while utilizing significantly fewer features. For maximizing accuracy and model interpretability, an attribute selection procedure based on GA is subsequently utilized to select the most informative and discriminative attributes from the reduced feature space. This two-stage optimization (PCA followed by GA) is one of the paper's main contributions, distinguishing it from much of the prior work that primarily uses full embeddings or simple filters. For precision, a Logistic Regression classifier is employed for the final classification, even compromising on interpretability. The model attains a high accuracy of 98% when tested on a synthetically equalized set of fake reports. It also shows significant improvements in precision, recall, and F1-score when compared to other models. This system can identify fake news on various digital platforms in real time, quickly, and in scalable ways due to the combination of a high-quality language model, dimensionality reduction, and evolutionary optimization.
KeywordsFake News Detection DistilBERT Principal Component Analysis (PCA) Genetic Algorithm Feature Selection Supervised Learning