AI Based Novel Model for Prediction of Cyber Security Attacks Using KNN Algorithm & XDR
PDF

Keywords

Cyber Security
Computer System Logs
Log Analysis
Artificial Intelligence
XDR (Extended Detection and Response)
XDR Log Analysis

How to Cite

D., Jadhav S, and Bombade B R. 2025. “AI Based Novel Model for Prediction of Cyber Security Attacks Using KNN Algorithm & XDR”. Journal of Trends in Computer Science and Smart Technology 7 (3): 357-75. https://doi.org/10.36548/jtcsst.2025.3.004.

Abstract

Cyberspace is treated as a fourth dimension of modern-day warfare apart from land, air and sea. Solutions are developed to provide cybersecurity to computer systems, but every time the attacker tries new methodologies and overcomes the security systems. Such a set of tools and solutions also consists of Log Analysis solutions. It is a proven fact that, Log Analysis helps to predict and prevent cybersecurity attacks. However, very few research attempts have been made regarding the application of Artificial Intelligence to Log Analysis (especially Extended Detection and Response (XDR) Log Analysis). Therefore, in this paper we propose and implement a K-Nearest Neighbors (KNN) algorithm based preventive and predictive system. The K-Nearest Neighbors algorithm is a non-parametric supervised learning algorithm. Extended Detection and Response (XDR) is one of the modern solutions that has the capability to collect and process data from various sub-systems connected in a given network and is an information goldmine from the cybersecurity audit perspective. In this paper, we propose to use the KNN algorithm over the XDR. Therefore, the proposed novel model includes steps such as; Input the data, checking for “missing values” and “duplicate entries”, identifying available “classes” and optimizing them to two or three Major Classes, then performing “label encoding” and creating the “correlation value-based matrix”. Further, we find out “Positive” and “Negative correlation values” and discard the rest values, then select the features which has highest correlation values. Later, we apply the scikit-learn class standard scaler method to scale the features to centre the data around a mean of "0" and a standard deviation of "1." Finally, apply the KNN classifier with Optuna to identify the K-nearest neighbor. This will generate the final output, which will define, whether the given log entry is of “Suspicious Class” or “Not Suspicious Class”. The Suspicious Class XDR log entries will be dealt with separately, as they might indicate a potential risk or incident of compromise (IOC). The proposed novel experiment has been tested on the standard icrosoft based GUIDE dataset and a locally generated in-lab dataset. The Microsoft GUIDE XDR data contains 13 million pieces of evidence across 33 entity types, 1.6 million alerts, and 1 million well annotated incidents collected from 6,100 organizations. In both cases, our experimentation has successfully achieved a result of 93.85% accuracy in predicting cybersecurity attacks.

PDF

References

Vergara Cobos, Estefania and Cakir, Selcen. 2024. A Review of the Economic Costs of Cyber Incidents. Washington, DC: World Bank.

Sunny, A. "A study on financial cyber-crimes, trends, patterns, and its effects in the economy." Addict Criminol 7, no. 1 (2024): 186.

National Crime Record Bureau (NCRB) Annual Report on Crimes in India 2022. Volume I. Ministry of Home Affairs, Government of India.

Antonescu, Mihail, and Ramona Birău. "Financial and non-financial implications of cybercrimes in emerging countries." Procedia Economics and Finance 32 (2015): 618-621.

Fujimoto, Mariko, Wataru Matsuda, and Takuho Mitsunaga. "Detecting attacks leveraging vulnerabilities fixed in MS17-010 from Event Log." In 2019 IEEE Conference on Application, Information and Network Security (AINS), IEEE, 2019, . 42-47.

Berlin, Konstantin, David Slater, and Joshua Saxe. "Malicious behavior detection using windows audit logs." In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, 2015, 35-44.

Dwaraki, Abhishek, Shachi Kumary, and Tilman Wolf. "Automated event identification from system logs using natural language processing." In 2020 International Conference on Computing, Networking and Communications (ICNC), IEEE, 2020, 209-215.

Visoottiviseth, Vasaka, and Vatcharanun Moonkhaen. "A centralized system for detecting attacks from windows event logs." In 2023 International Electrical Engineering Congress (iEECON), IEEE, 2023, 367-371.

Fujimoto, Mariko, Wataru Matsuda, and Takuho Mitsunaga. "Detecting abuse of domain administrator privilege using windows event log." In 2018 IEEE Conference on Application, Information and Network Security (AINS), IEEE, 2018, 15-20.

Baráth, Július. "Optimizing windows 10 logging to detect network security threats." In 2017 Communication and Information Technologies (KIT), IEEE, 2017, 1-4.

Dwyer, John, and Traian Marius Truta. "Finding anomalies in windows event logs using standard deviation." In 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, IEEE, 2013, 563-570.

Garcia, Karen A., Raul Monroy, Luis A. Trejo, Carlos Mex-Perera, and Eduardo Aguirre. "Analyzing log files for postmortem intrusion detection." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, no. 6 (2012): 1690-1704.

Sabry, Fouad. Master Algorithm: Fundamentals and Applications. Vol. 166. One Billion Knowledgeable, 2023.

Sun, Jingwen & Du, Weixing & Shi, Niancai. (2018). A Survey of kNN Algorithm. Information Engineering and Applied Computing. 1. 10.18063/ieac.v1i1.770.

Adebiyi, Marion O., Oladayo G. Atanda, Chidinma Okeke, Ayodele A. Adebiyi, and Abayomi A. Adebiyi. "Network intrusion detection using K-nearest neighbors (KNN) and recurrent neural networks (RNN)." In 2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG), IEEE, 2024, 1-8.

Priya, G. Shanmuga, M. Latha, K. Manoj, and Siva Prakash. "Unusual Activity And Anomaly Detection In Surveillance Using GMM-KNN Model." In 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), IEEE, 2021, 1450-1457.

Liu, Lin, Jinshu Su, Ximeng Liu, Rongmao Chen, Kai Huang, Robert H. Deng, and Xiaofeng Wang. "Toward highly secure yet efficient KNN classification scheme on outsourced cloud data." IEEE Internet of Things Journal 6, no. 6 (2019): 9841-9852.

Agarwal, Muskan, Kanwarpartap Singh Gill, Rahul Chauhan, Akanksha Kapruwan, and Deepak Banerjee. "Classification of network security attack using KNN (K-nearest neighbour) and comparison of different attacks through different machine learning techniques." In 2024 3rd International Conference for Innovation in Technology (INOCON), IEEE, 2024, 1-7.

Hassan, Md Mahedi, Ahsan Ullah, Anup Chakraborty, Nurunnabi Sarker, and Bikash Kumar Saha Roy. "Enhancing The Cyber Security Using Ensemble Stacking Model For Phishing Sites Detection With Hyperparameter Tuning." In 2024 27th International Conference on Computer and Information Technology (ICCIT), IEEE, 2024, 1809-1814.

Abdulboriy, Alimov, and Ji Sun Shin. "An incremental majority voting approach for intrusion detection system based on machine learning." IEEE Access 12 (2024): 18972-18986.

Kumar, Naween, Supragya Sharma, Sahil Gupta, Sajal Jain, and Sujal Singh. "Harnessing AI for Cybersecurity: Performance Evaluation of Machine Learning Techniques in Intrusion Detection Systems." In 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT), IEEE, 2025, 693-698.

Vieira, João, Rui P. Duarte, and Horácio C. Neto. "kNN-STUFF: KNN streaming unit for Fpgas." IEEe Access 7 (2019): 170864-170877.

Kaur, Manbir, Chintan Thacker, Laxmi Goswami, Thamizhvani TR, Imad Saeed Abdulrahman, and A. Stanley Raj. "Alzheimer’s disease detection using weighted KNN classifier in comparison with medium KNN classifier with improved accuracy." In 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), IEEE, 2023, 715-718.

Zhao, Puning, and Lifeng Lai. "Analysis of knn density estimation." IEEE Transactions on Information Theory 68, no. 12 (2022): 7971-7995.

Xing, Wenchao, and Yilin Bei. "Medical health big data classification based on KNN classification algorithm." Ieee Access 8 (2019): 28808-28819.

Tu, Bing, Jinping Wang, Xudong Kang, Guoyun Zhang, Xianfeng Ou, and Longyuan Guo. "KNN-based representation of superpixels for hyperspectral image classification." IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11, no. 11 (2018): 4032-4047.

Microsoft Guide Dataset details; https://www.kaggle.com/datasets/Microsoft/microsoft-security-incident-prediction/data?select=GUIDE_Train.csv doi.org/10.34740/kaggle/dsv/8929038

Microsoft Guide Dataset details; Scott Freitas et al., Cornell University Revised version V4, Nov 2024, https://arxiv.org/abs/2407.09017