Volume - 7 | Issue - 3 | september 2025
Published
26 August, 2025
Cyberspace is treated as a fourth dimension of modern-day warfare apart from land, air and sea. Solutions are developed to provide cybersecurity to computer systems, but every time the attacker tries new methodologies and overcomes the security systems. Such a set of tools and solutions also consists of Log Analysis solutions. It is a proven fact that, Log Analysis helps to predict and prevent cybersecurity attacks. However, very few research attempts have been made regarding the application of Artificial Intelligence to Log Analysis (especially Extended Detection and Response (XDR) Log Analysis). Therefore, in this paper we propose and implement a K-Nearest Neighbors (KNN) algorithm based preventive and predictive system. The K-Nearest Neighbors algorithm is a non-parametric supervised learning algorithm. Extended Detection and Response (XDR) is one of the modern solutions that has the capability to collect and process data from various sub-systems connected in a given network and is an information goldmine from the cybersecurity audit perspective. In this paper, we propose to use the KNN algorithm over the XDR. Therefore, the proposed novel model includes steps such as; Input the data, checking for “missing values” and “duplicate entries”, identifying available “classes” and optimizing them to two or three Major Classes, then performing “label encoding” and creating the “correlation value-based matrix”. Further, we find out “Positive” and “Negative correlation values” and discard the rest values, then select the features which has highest correlation values. Later, we apply the scikit-learn class standard scaler method to scale the features to centre the data around a mean of "0" and a standard deviation of "1." Finally, apply the KNN classifier with Optuna to identify the K-nearest neighbor. This will generate the final output, which will define, whether the given log entry is of “Suspicious Class” or “Not Suspicious Class”. The Suspicious Class XDR log entries will be dealt with separately, as they might indicate a potential risk or incident of compromise (IOC). The proposed novel experiment has been tested on the standard icrosoft based GUIDE dataset and a locally generated in-lab dataset. The Microsoft GUIDE XDR data contains 13 million pieces of evidence across 33 entity types, 1.6 million alerts, and 1 million well annotated incidents collected from 6,100 organizations. In both cases, our experimentation has successfully achieved a result of 93.85% accuracy in predicting cybersecurity attacks.
KeywordsCyber Security Computer System Logs Log Analysis Artificial Intelligence XDR (Extended Detection and Response) XDR Log Analysis