Design of Associate Content Based Classifier for Malicious URL Prediction by Rule Generation Algorithm

Vivekanadam Balasubramaniam

doi:10.36548/jitdw.2021.1.005

Design of Associate Content Based Classifier for Malicious URL Prediction by Rule Generation Algorithm

Open Access

https://doi.org/10.36548/jitdw.2021.1.005

Vol. 3, No. 1 (2021)

Published: 18 May, 2021

Pages: 44-56

Vivekanadam Balasubramaniam Vivekanadam Balasubramaniam

Faculty of Computer Science and Multimedia, Lincoln University College, Kota Bharu

Faculty of Computer Science and Multimedia, Lincoln University College, Kota Bharu

view PDF

How to Cite

Balasubramaniam, Vivekanadam. 2021. “Design of Associate Content Based Classifier for Malicious URL Prediction by Rule Generation Algorithm”. Journal of Information Technology and Digital World 3 (1): 44-56. https://doi.org/10.36548/jitdw.2021.1.005.

Keywords

Deep learning

malicious website

Abstract

Recently, the internet is becoming as the most effective tool to interact with many foreign societies especially during COVID-19 pandemic. Moreover, the digital platform is increasing in many developing countries and at the same time, the chance of fraudulence is also increasing day by day. In the digital world, phishing assaults are emerging as the most common type of social engineering attack. Currently, many websites are targeting to acquire the confidential data, which is stored in websites. Recently, the classification techniques are employed to detect the phishing websites. Many tools are used for anti-phishing purposes; they are blacklist and antivirus software. The confidential data in a fake surrounding has intended the category of leaked data due to the action of attackers. In this scenario, machine learning method is observed as a very effective to classify the phishing and non-phishing web (Uniform Resource Locator) URLs. This classification struggles in classifying the leaked data content-based challenge. Therefore, the proposed algorithm is associated with the content-based classification method along with the rule-based generator algorithm. This research article integrates the content-based classification with a rule-based generator algorithm to improve the overall performance of the system. The updated public online repository called Mendeley dataset is used in the proposed research work. The proposed algorithm is used in 7k phishing and real websites content data for performing feature extraction. The extracted feature is then analyzed with our proposed algorithm to provide better prediction accuracy. Also, the proposed work has concluded that, the associate algorithm has achieved better accuracy, when compared to other existing methods.

References

University of Waikato. WEKA. Available online: https://www.cs.waikato.ac.nz/ml/weka/ (accessed on 10 April 2020).
Hahsler, M.; Johnson, I.; Kliegr, T.; Kuchaˇr, J. Associative Classification in R: Arc, arulesCBA, and rCBA. R J. 2019, 9, 254–267.[CrossRef]
Jiˇrí, F.; Kliegr, T. Classification based on associations (CBA)—A performance analysis. In Proceedings of the CEURWorkshop Proceedings, Luxembourg, 20–26 September 2018; Volume 2204.
Arntz, P. Explained: Domain Generating Algorithm. Available online: https://blog.malwarebytes.com/security-world/2016/12/explained-domain-generating-algorithm/ (accessed on 6 April 2020).
Hadi, W.; Aburub, F.; Alhawari, S. A new fast associative classification algorithm for detecting phishing websites. Appl. Soft Comput. J. 2016.[CrossRef]
Kim, S.; Kim, J.; Nam, S.; Kim, D. WebMon: ML- and YARA-based malicious webpage detection. Comput. Netw. 2018, 137, 119–131.[CrossRef]
Li, Y.; Yang, Z.; Chen, X.; Yuan, H.; Liu,W. A stacking model using URL and HTML features for phishing webpage detection. Future Gener. Comput. Syst. 2019, 94, 27–39.[CrossRef]
Google Safe Browsing. Available online: https://safebrowsing.google.com/ (accessed on 20 November 2019).
Micro, T. 10 Scary Tricks Cybercriminals Use to Lure Unsuspecting Users. Available online: https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-digital-threats/10-scary-tricks cybercriminals-use-to-lure-unsuspecting-users (accessed on 20 January 2020).
E. Uc¸ar, M. Incestas¸, and M. Ucar. A deep learning approach for detection of malicious urls. In Proc. of the 6th International Management Information Systems Conference (IMISC’19), Istanbul, Turkey, pages 12–20, October 2019.
Kan, M.-Y. And Thi, H. O. N. . Fast webpage classification using url features. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM).(2005)
Garera, S., Provos, N., Chew, M., And Rubin, A. D. . A Framework for Detection and measurement of phishing attacks. In Proceedings of the ACM Workshop on Rapid Malcode (WORM). Alexandria, VA.(2007).
McGrath, D. K. And Gupta, M. . Behind phishing: An examination of phisher modi operandi. In Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET).(2008).
Provos, N.,Mavrommatis, P.,Rajab,M. A., And Monrose, F. All your iFRAMEs point toUs. In Proceedings of the USENIX Security Symposium.(2008)
Moshchuk, A., Bragin, T., Deville, D., Gribble, S. D., And Levy, H. M. SpyProxy: Execution-based detection of malicious web content. In Proceedings of the USENIX Security Symposium.(2007).
Wang, Y.-M., Beck, D., Jiang, X.,Roussev, R.,Verbowski, C.,Chen, S., And King, S. Automated web patrol with strider honeymonkeys: Finding web sites that exploit browser vulnerabilities. In Proceedings of the Symposium on Network and Distributed System Security (NDSS).(2006)
H. Choi, B. Zhu, and H. Lee. Detecting malicious web links and identifying their attack types. In Proc. of the 2nd USENIX Conference on Web Application Development (WebApps’11), Portland, Oregon, USA, page 11. USENIX, June 2011.
J. Ma, L. Saul, S. Savage, and G. Voelker. Identifying suspicious urls: An application of large-scale online learning. In Proceedings of the 26th International Conference on Machine Learning (ICML’09), Montreal Quebec, Canada, pages 681–688. ACM, June 2009.
Fette, I., Sadeh, N., And Tomasic, A. Learning to detect phishing emails. In Proceedings of the International World Wide Web Conference (WWW).(2007)
Bergholz, A., Chang, J.-H., Paass, G., Reichartz, F., And Strobel, S. Improved Phishing Detection using Model-Based Features. In Proceedings of the Conference on Email and Anti-Spam (CEAS).(2008)
Kolari, P., Finin, T., And Joshi, A. SVMs for the blogosphere: Blog identification and splog detection. In Proceedings of the AAAI Spring Symposium on Computational Approaches to Analysing Weblogs.(2006)
R. Verma and A. Das. What’s in a url: Fast feature extraction and malicious url detection. In Proc. of the 3rd ACM on International Workshop on Security and Privacy Analytics (IWSPA’17), Scottsdale, Arizona, USA, pages 55–63. ACM, March 2017.
J. Zhao, N. Wang, Q. Ma, and Z. Cheng. Classifying malicious urls using gated recurrent neural networks. In Proc. of the 12th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS’18), Kunibiki Messe, Matsue, Japan, volume 773 of Advances in Intelligent Systems and Computing, pages 385–394. Springer, Cham, June 2018.

Design of Associate Content Based Classifier for Malicious URL Prediction by Rule Generation Algorithm

How to Cite

Download Citation

Keywords

Abstract

References