Machine Learning Approach for Adaptive Data Protection in Document Database Systems
PDF
PDF

How to Cite

Belhaj, Abdelilah, Soumia Ziti, Khalil Ladrham, Souad Najoua Lagmiri, and Karim El Bouchti. 2026. “Machine Learning Approach for Adaptive Data Protection in Document Database Systems”. Journal of Trends in Computer Science and Smart Technology 8 (2): 219-42. https://doi.org/10.36548/jtcsst.2026.2.002.

Keywords

LightGBM
NoSQL Security
Machine Learning
Adaptive Encryption
Noise
Real-time

Abstract

With the growing dependence of cloud applications and enterprise organizations on NoSQL databases, it is necessary to ensure data protection while maintaining efficiency and performance. Traditional static encryption systems deliver strong regulatory protection but are unable to react to changing contexts in dynamic and zerotrust environments, which limits their ability to address abnormal behaviour, malicious insiders, and advanced attackers. This work proposes AdaptiCrypt-ML, a lightweight proxy based on machine learning, which aims to implement domain-level adaptive encryption in NoSQL database security systems. The framework utilizes the LightGBM model to classify 14 contextual, behavioural, and data-sensitivity features to determine immediately the most appropriate encryption level across four security categories. When data is entered, the encryption level is dynamically determined according to the risk level, whereas a risk-based decryption policy controls the extent to which data is revealed when retrieved. Empirical results, derived from a statistically validated synthetic dataset of 50,000 examples, demonstrate strong predictive performance, with an overall accuracy of 99.1%, an F1-macro score of 0.963, and a low generalization gap of 0.0018. The average inference time ranged between 0.5 and 0.8 milliseconds, and the total response time stabilized at 3.25 milliseconds (P95 = 4.10 milliseconds), with an average of 3,120 queries per second. A 5% noise robustness test validated 96% performance stabilization. These findings emphasize the possibility of integrating context-aware adaptive encryption into NoSQL frameworks without sacrificing real-time requirements.

References

  1. Akbar, S. K., Navya, V., & Suresh, K. (2025). Mastering NoSQL Databases: Strategies for Efficient Data Handling. https://doi.org/10.63328/books/978-93-47093-29-6.
  2. Mailewa, Akalanka, Susan Mengel, Lisa Gittner, and Hafiz Khan. "Mechanisms and Techniques to Enhance the Security of Big Data Analytic Framework with Mongodb and Linux Containers." Array 15 (2022): 100236.
  3. Zainal, Hana Yousuf. "Survey analysis: Enhancing the Security of Vectorization by Using Word2vec and CryptDB." Advances in Science, Technology and Engineering Systems Journal (2020). 374–380.
  4. Zhang, Dingwen, Shuang Yang, Ming Chen, Lei Zheng, Jiashu Fan, and Aidi Dong. "Adaptive Encryption Method of Sensitive Data in Data Center Database Based On Big Data Cross-Mapping Fusion Algorithm." Discover Applied Sciences 7, no. 8 (2025): 924.
  5. Kumar, Priyanka Rajan, and Sonia Goel. "A Secure and Efficient Encryption System Based on Adaptive and Machine Learning for Securing Data in Fog Computing." Scientific reports 15, no. 1 (2025): 11654.
  6. Premakumari, Sreeja Balachandran Nair, Gopikrishnan Sundaram, Marco Rivera, Patrick Wheeler, and Ricardo E. Pérez Guzmán. "Reinforcement Q-learning-based Adaptive Encryption Model for Cyberthreat Mitigation in Wireless Sensor Networks." Sensors 25, no. 7 (2025): 2056.
  7. Atlam, Hany F., and Gary B. Wills. "ANFIS for Risk Estimation in Risk-Based Access Control Model for Smart Homes." Multimedia Tools and Applications 82, no. 12 (2023): 18269-18298.
  8. Alharbe, Nawaf, Abeer Aljohani, Mohamed Ali Rakrouki, and Mashael Khayyat. "An Access Control Model Based on System Security Risk for Dynamic Sensitive Data Storage in the Cloud." Applied Sciences 13, no. 5 (2023): 3187.
  9. Jin, Ziqi, Dongmei Li, Xiaomei Zhang, and Zhi Cai. "Research on Dynamic Searchable Encryption Method Based on Bloom Filter." Applied Sciences 14, no. 8 (2024): 3379.
  10. Hu, Zhuobin, Jiabei Wang, Zhengkai Chen, Zhaoxuan Ge, Mingyu Bian, Lei Chen, and Yongbin Zhou. "SEAC: Dynamic Searchable Symmetric Encryption with Lightweight Update-Search Permission Control." Cybersecurity 8, no. 1 (2025): 75.
  11. Sheik, Syed Amma, and Amutha Prabakar Muniyandi. "Secure Authentication Schemes in Cloud Computing with Glimpse of Artificial Neural Networks: A Review." Cyber Security and Applications 1 (2023): 100002.
  12. Ferreira, Maurício J., Nuno A. Silva, Armando N. Pinto, and Nelson J. Muga. "Characterization of a Quantum Random Number Generator Based on Vacuum Fluctuations." Applied Sciences 11, no. 16 (2021): 7413.
  13. Sharma, Anuj, Alex Koohang, and Satender Pal Singh. "Information Security Policy Compliance: A Structured Review Using Scientometric Analysis and Topic Modeling." Journal of Global Information Management (JGIM) 33, no. 1 (2025): 1-32.
  14. Smith, Hussein. Python-MongoDB Atlas Integration: Exploring Advanced Python Libraries and Tools for Working with MongoDB Atlas (2024).
  15. Antonopoulos, Panagiotis, Arvind Arasu, Kunal D. Singh, Ken Eguro, Nitish Gupta, Rajat Jain, Raghav Kaushik et al. "Azure SQL Database Always Encrypted." In Proceedings of the 2020 ACM SIGMOD international conference on management of data, pp. 1511-1525. 2020.
  16. Yu, Xiaopeng, Wei Zhao, Yunfan Huang, Juan Ren, and Dianhua Tang. "Privacy‐Preserving Outsourced Logistic Regression on Encrypted Data from Homomorphic Encryption." Security and Communication Networks 2022, no. 1 (2022): 1321198.
  17. Ladrham, Khalil, and Hicham Gueddah. "Sentiment Analysis on Moroccan Dialect of Arabic Combining NLP and ML Methods." In International Conference on Arabic Language Processing, pp. 3-16. Cham: Springer Nature Switzerland, 2024.
  18. Merzoug, Ahmed, Fehmi Özbayrak, John T. Foster, and Michael J. Pyrcz. "Beyond Random Forest: How Spatial Bagging and Spatial Random Forest Dominate for Subsurface Applications?." Computational Geosciences 29, no. 6 (2025): 52.
  19. Jin, Dongzi, Yiqin Lu, Jiancheng Qin, Zhe Cheng, and Zhongshu Mao. "SwiftIDS: Real-Time Intrusion Detection System Based on LightGBM and Parallel Intrusion Detection Mechanism." Computers & Security 97 (2020): 101984.
  20. Khan, Wisal, Teerath Kumar, Cheng Zhang, Kislay Raj, Arunabha M. Roy, and Bin Luo. "SQL and NoSQL Database Software Architecture Performance Analysis and Assessments—A Systematic Literature Review." Big Data and Cognitive Computing 7, no. 2 (2023): 97.
  21. Ladrham, Khalil, Hicham Gueddah, and Brahim Ouben Hssain. 2026. “Benchmarking Lightweight Convolution Neural Networks for Children’s Arabic Handwriting”. Journal of Innovative Image Processing 8 (1): 216-32. https://doi.org/10.36548/jiip.2026.1.012.
  22. Fuentes, Jose, Ines Ortega-Fernandez, Nora M. Villanueva, and Marta Sestelo. "Cybersecurity Threat Detection Based on a UEBA Framework Using Deep Autoencoders." arXiv preprint arXiv:2505.11542 (2025).
  23. MongoDB Inc. 2025. "Queryable Encryption v2: Fast Searchable Encryption." MongoDB Documentation. https://www.mongodb.com/docs/manual/core/queryable-encryption
  24. Stiawan, Deris, Mohd Yazid Bin Idris, Alwi M. Bamhdi, and Rahmat Budiarto. "CICIDS-2017 Dataset Feature Analysis with Information Gain for Anomaly Detection." IEEE access 8 (2020): 132911-132921.
  25. Alsaedi, Abdullah, Nour Moustafa, Zahir Tari, Abdun Mahmood, and Adnan Anwar. "TON_IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems." Ieee Access 8 (2020): 165130-165150.