Abstract
With the escalating threats in the digital landscape of cybersecurity, the rapid and widespread of masked and phishing URLs poses a significant threat to online users. Detecting these malicious URLs is a paramount concern to safeguard sensitive information and prevent unauthorized access. This study delves into the application of machine learning algorithms for the accurate identification of masked and phishing URLs. Specifically, Decision Tree, Random Forest, and XGBoost algorithms are employed to create predictive models capable of distinguishing between legitimate and malicious URLs. The research involves the collection of a comprehensive dataset comprising both legitimate and various forms of malicious URLs. Feature engineering techniques are applied to extract relevant information from the URLs, transforming them into numerical representations suitable for machine learning. The three selected algorithms are individually trained and finetuned using the dataset, exploiting their unique capabilities to distinguish patterns indicative of phishing attempts and masked URLs. The performance of each model is evaluated using metrics such as accuracy, precision, recall, and web traffic. This study examines the application of machine learning algorithms to identify masked and phishing URLs. By comparing the results of these algorithms, a predictive model capable of distinguishing between legitimate and malicious URLs is created. Experimental results showed promising accuracy rates and potential to contribute to online security efforts. The implications of this research extend to advanced cybersecurity systems, offering enhanced protection against evolving threats in the digital domain.
References
Shahrivari, Vahid, Mohammad Mahdi Darabi, and Mohammad Izadi. "Phishing detection using machine learning techniques." arXiv preprint arXiv:2009.11116 (2020).
Babu Rao Pawar, Nagasunder Rao Pawar. "Detection of Phishing URL using Machine Learning." PhD diss., Dublin, National College of Ireland, 2021.”
Patil, Dharmaraj R., and Jayantro B. Patil. "Malicious URLs detection using decision tree classifiers and majority voting technique." Cybernetics and Information Technologies 18, no. 1 (2018): 11-29.
Fazal, Ashar Ahmed, and Maryam Daud. "Detecting Phishing Websites using Decision Trees: A Machine Learning Approach." International Journal for Electronic Crime Investigation 7, no. 2 (2023).
Selvakumari, M., M. Sowjanya, Sneha Das, and S. Padmavathi. "Phishing website detection using machine learning and deep learning techniques." In Journal of Physics: Conference Series, vol. 1916, no. 1, p. 012169. IOP Publishing, 2021.
Salloum, Said, Tarek Gaber, Sunil Vadera, and Khaled Shaalan. "A systematic literature review on phishing email detection using natural language processing techniques." IEEE Access 10 (2022): 65703-65727.
M I, Shilpa. “Malicious Websites Classification Using Machine Learning Techniques: A Survey Paper.” International Journal for Research in Applied Science and Engineering Technology (2022): n. pag.
Khonji, Mahmoud, Youssef Iraqi, and Andrew Jones. "Phishing detection: a literature survey." IEEE Communications Surveys & Tutorials 15, no. 4 (2013): 2091-2121. [9]. Aggarwal, Anupama, Ashwin Rajadesingan and Ponnurangam Kumaraguru. “PhishAri: Automatic realtime phishing detection on twitter.” 2012 eCrime Researchers Summit (2012): 1-12.
Singh, Charu. "Phishing website detection based on machine learning: A survey." In 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 398-404. IEEE, 2020.
TRAGHA, Abderrah“m. "Machine learning for web page classification: a survey." International Journal of Information Science and Technology 3, no. 5 (2019): 38-50.
Vanhoenshoven, Frank, Gonzalo Nápoles, Rafael Falcon, Koen Vanhoof, and Mario Köppen. "Detecting malicious URLs using machine learning techniques." In 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-8. IEEE, 2016.
https://www.kaggle.com/datasets/shashwatwork/web-page-phishing-detection-dataset
