Abstract
The quick development of internet technology has opened the door for a number of illegal activities targeted at users. The fact that these malevolent actions are usually carried out by anonymous people or organizations makes identification and tracking more difficult. In order to address these problems, a novel method for categorizing illegal dark content has been created called the Lyrebird Green Anaconda Optimization-based Bayesian Hierarchical Neural Attention Harmonic Network (LGAO_BHNAHN). To find and extract pertinent information, textural content extraction is done first. Following that, GPT-NEOX receives the extracted contents and uses them to process and produce text or passages. The Bayesian Hierarchical Neural Attention Harmonic Network (BHNAHN) is then used to classify illicit dark web content. However, the Bayesian Neural Network (BNN) and the Hierarchical Neural Attention classifier with Forward Harmonic analysis are combined to create BHNAHN. Additionally, Lyrebird Green Anaconda Optimization (LGAO), which combines the Lyrebird Optimization Algorithm (LOA) and Green Anaconda Optimization (GAO), is used to train BHNAHN. Lastly, GLOA-trained HNAHN is used to classify drug and arms types. The proposed framework makes a significant advancement in secure, real-time threat detection by combining GPT-NEOX with a novel Bayesian Hierarchical Neural Attention Harmonic Network optimized by the Lyrebird Green Anaconda Algorithm (LGAO). It achieves 93.30% accuracy, with a False Positive Rate (FPR) of 5.62% and a True Positive Rate (TPR) of 92.85% in classifying illicit dark web content.
References
Makridakis, Spyros. "The forthcoming Artificial Intelligence (AI) revolution: Its impact on society and firms." Futures 90 (2017): 46-60.
Rodríguez, John Ibañez, Santiago Rocha Durán, Daniel Díaz-López, Javier Pastor-Galindo, and Félix Gómez Mármol. "C 3-Sex: A conversational agent to detect online sex offenders." Electronics 9, no. 11 (2020): 1779.
Pastor-Galindo, Javier, Mattia Zago, Pantaleone Nespoli, Sergio López Bernal, Alberto Huertas Celdrán, Manuel Gil Pérez, José A. Ruipérez-Valiente, Gregorio Martínez Pérez, and Félix Gómez Mármol. "Spotting political social bots in Twitter: A use case of the 2019 Spanish general election." IEEE Transactions on Network and Service Management 17, no. 4 (2020): 2156-2170.
Ramírez Sánchez, Julián, Alejandra Campo-Archbold, Andrés Zapata Rozo, Daniel Díaz-López, Javier Pastor-Galindo, Félix Gómez Mármol, and Julián Aponte Díaz. "Uncovering cybercrimes in social media through natural language processing." Complexity 2021, no. 1 (2021): 7955637.
Cascavilla, Giuseppe, Gemma Catolino, and Mirella Sangiovanni. "Illicit darkweb classification via natural-language processing: Classifying illicit content of webpages based on textual information." arXiv preprint arXiv:2312.04944 (2023).
Yegneswaran, Shalini Ghosh Phillip Porras Vinod, and Ken Nitz Ariyam Das. "ATOL: A Framework for Automated Analysis and Categorization of the Darkweb Ecosystem." (2017).
Hayes, Darren R., Francesco Cappa, and James Cardon. "A framework for more effective dark web marketplace investigations." Information 9, no. 8 (2018): 186.
Chertoff, Michael. "A public policy perspective of the Dark Web." Journal of Cyber Policy 2, no. 1 (2017): 26-38.
Alshammery, Mohammed Khalafallah, and Abbas Fadhil Aljuboori. "Classifying illegal activities on tor network using hybrid technique." Iraqi Journal of Science (2022): 3994-4004.
Alaidi, Abdul Hadi M., M. Roa’a, H. T. H. S. ALRikabi, Ibtisam A. Aljazaery, and Saif Hameed Abbood. "Dark web illegal activities crawling and classifying using data mining techniques." iJIM 16, no. 10 (2022): 123.
Wang, Gang, Hsinchun Chen, and Homa Atabakhsh. "Automatically detecting deceptive criminal identities." Communications of the ACM 47, no. 3 (2004): 70-76.
Jin, Youngjin, Eugene Jang, Yongjae Lee, Seungwon Shin, and Jin-Woo Chung. "Shedding new light on the language of the dark web." arXiv preprint arXiv:2204.06885 (2022).
Zhang, Ning, Mohammadreza Ebrahimi, Weifeng Li, and Hsinchun Chen. "Counteracting dark Web text-based CAPTCHA with generative adversarial learning for proactive cyber threat intelligence." ACM Transactions on Management Information Systems (TMIS) 13, no. 2 (2022): 1-21.
Iqbal, Farkhund, Benjamin CM Fung, Mourad Debbabi, Rabia Batool, and Andrew Marrington. "Wordnet-based criminal networks mining for cybercrime investigation." IEEE access 7 (2019): 22740-22755.
Shin, Gun-Yoon, Younghoan Jang, Dong-Wook Kim, Sungjin Park, A-Ran Park, Younghwan Kim, and Myung-Mook Han. "Dark side of the web: Dark web classification based on TextCNN and topic modeling weight." IEEE Access 12 (2023): 36361-36371.
Sangher, Kanti Singh, Archana Singh, Hari Mohan Pandey, and Vivek Kumar. "Towards safe cyber practices: Developing a proactive cyber-threat intelligence system for dark web forum content by identifying cybercrimes." Information 14, no. 6 (2023): 349.
Dalvi, Ashwini, Soham Bhoir, Nishavak Naik, Atharva Kitkaru, Irfan Siddavatam, and Sunil Bhirud. "A hybrid TF-IDF and RNN model for multi-label classification of the deep and dark web." International Journal of Advanced Computer Science and Applications 14, no. 7 (2023).
Murty, C. A., and Parag H. Rughani. "Dark web text classification by learning through SVM optimization." J Adv Inf Technol 13, no. 6 (2022).
Black, Sid, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He et al. "Gpt-neox-20b: An open-source autoregressive language model." arXiv preprint arXiv:2204.06745 (2022).
Xie, Xurong, Xunying Liu, Tan Lee, and Lan Wang. "Bayesian learning for deep neural network adaptation." IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021): 2096-2110.
Mullachery, Vikram, Aniruddh Khera, and Amir Husain. "Bayesian neural networks." arXiv preprint arXiv:1801.07710 (2018).
Kowsari, Kamran, Donald E. Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S. Gerber, and Laura E. Barnes. "Hdltex: Hierarchical deep learning for text classification." In 2017 16th IEEE international conference on machine learning and applications (ICMLA), IEEE, 2017, 364-371.
