Abstract
The real-time data is growing extensively due to the immense use of numerous web applications, IoT devices, social media, and network-based applications. This online streaming data, characterized by its volume and velocity, is expressed as big data. While it is accessible for business analytics and research purposes, it can often sacrifice individual privacy. There are different traditional approaches, such as k-anonymity, l-diversity, and t-closeness, that exist to safeguard individual privacy by making each data record indistinguishable from at least k other records. The dynamic nature of real-time stream data makes these methods difficult to apply. However, various research shows that modifications to these methods can effectively protect individual privacy in streaming data. This paper presents a comprehensive review of k-anonymity-based techniques that adapt sliding window models, clustering approaches, and other variations to efficiently protect data privacy while maintaining k-anonymity without compromising data utility. The review discusses the challenges faced in protecting stream data privacy and concludes with research directions to enhance these methods for adaptive and scalable privacy-preserving mechanisms for streaming data.
References
Dwork, Cynthia, Frank McSherry, Kobbi Nissim, and Adam Smith. "Calibrating noise to sensitivity in private data analysis." In Theory of cryptography conference, Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, 265-284.
Aggarwal, Charu C., and Philip S. Yu. "A condensation approach to privacy preserving data mining." In International Conference on Extending Database Technology, pp. 183-199. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004.
Agrawal, Rakesh, and Ramakrishnan Srikant. "Privacy-preserving data mining." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 2000, 439-450.
Sweeney, Latanya. "k-anonymity: A model for protecting privacy." International journal of uncertainty, fuzziness and knowledge-based systems 10, no. 05 (2002): 557-570.
Samarati, Pierangela. "Protecting respondents identities in microdata release." IEEE transactions on Knowledge and Data Engineering 13, no. 6 (2002): 1010-1027.
Samarati, Pierangela, and Latanya Sweeney. "Generalizing data to provide anonymity when disclosing information." In PODS, vol. 98, no. 188, 1998, 10-1145.
Samarati, Pierangela, and Latanya Sweeney. "Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression." (1998).
Hewage, U. H. W. A., Roopak Sinha, and M. Asif Naeem. "Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review." Artificial Intelligence Review 56, no. 9 (2023): 10427-10464.
Machanavajjhala, Ashwin, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. "l-diversity: Privacy beyond k-anonymity." Acm transactions on knowledge discovery from data (tkdd) 1, no. 1 (2007): 3-es.
Li, Ninghui, Tiancheng Li, and Suresh Venkatasubramanian. "t-closeness: Privacy beyond k-anonymity and l-diversity." In 2007 IEEE 23rd international conference on data engineering, IEEE, 2006, 106-115.
Byun, Ji-Won, Yonglak Sohn, Elisa Bertino, and Ninghui Li. "Secure anonymization for incremental datasets." In Workshop on secure data management, Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, 48-63.
Zubaroğlu, Alaettin, and Volkan Atalay. "Data stream clustering: a review." Artificial Intelligence Review 54, no. 2 (2021): 1201-1236.
Byun, Ji-Won, Ashish Kamra, Elisa Bertino, and Ninghui Li. "Efficient k-anonymization using clustering techniques." In International conference on database systems for advanced applications, Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, 188-200.
Aggarwal, Charu C., ed. Data streams: models and algorithms. Vol. 31. Springer Science & Business Media, 2007.
Wang, Weiping, Jianzhong Li, Chunyu Ai, and Yingshu Li. "Privacy protection on sliding window of data streams." In 2007 International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2007), IEEE, 2007, 213-221.
Huang, Wangfei, Lifei Chen, and Qingshan Jiang. "A novel subspace clustering algorithm with dimensional density." In 2010 2nd International Conference on Future Computer and Communication, vol. 3, IEEE, 2010, V3-71.
Fung, Benjamin CM, Ke Wang, Ada Wai-Chee Fu, and Jian Pei. "Anonymity for continuous data publishing." In Proceedings of the 11th international conference on Extending database technology: Advances in database technology, 2008, 264-275.
Cao, Jianneng, Barbara Carminati, Elena Ferrari, and Kian-Lee Tan. "Castle: Continuously anonymizing data streams." IEEE Transactions on Dependable and Secure Computing 8, no. 3 (2010): 337-352.
Wang, Pu, Jianjiang Lu, Lei Zhao, and Jiwen Yang. "B-castle: An efficient publishing algorithm for k-anonymizing data streams." In 2010 Second WRI Global Congress on Intelligent Systems, vol. 2, IEEE, 2010, 132-136.
Zakerzadeh, Hessam, and Sylvia L. Osborn. "Faanst: fast anonymizing algorithm for numerical streaming data." In International Workshop on Data Privacy Management, pp. 36-50. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010.
Guo, Kun, and Qishan Zhang. "Fast clustering-based anonymization approaches with time constraints for data streams." Knowledge-Based Systems 46 (2013): 95-108.
Mohammadian, Esmaeil, Morteza Noferesti, and Rasool Jalili. "FAST: fast anonymization of big data streams." In Proceedings of the 2014 international conference on big data science and computing, 2014, 1-8.
Yang, Lu, Xingshu Chen, Yonggang Luo, Xiao Lan, and Wei Wang. "IDEA: A utility-enhanced approach to incomplete data stream anonymization." Tsinghua Science and Technology 27, no. 1 (2021): 127-140.
Otgonbayar, Ankhbayar, Zeeshan Pervez, Keshav Dahal, and Steve Eager. "K-VARP: k-anonymity for varied data streams via partitioning." Information Sciences 467 (2018): 238-255.
Sadeghi-Nasab, Alireza, Hossein Ghaffarian, and Mohsen Rahmani. "Apache flink and clustering-based framework for fast anonymization of iot stream data." Intelligent Systems with Applications 20 (2023): 200267.
Zhou, Bin, Yi Han, Jian Pei, Bin Jiang, Yufei Tao, and Yan Jia. "Continuous privacy preserving publishing of data streams." In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, 2009, 648-659.
Chaudhuri, Surajit, Vivek Narasayya, and Ravishankar Ramamurthy. "Diagnosing estimation errors in page counts using execution feedback." In 2008 IEEE 24th International Conference on Data Engineering, IEEE, 2008, 1013-1022.
Zakerzadeh, Hessam, and Sylvia L. Osborn. "Delay-sensitive approaches for anonymizing numerical streaming data." International journal of information security 12, no. 5 (2013): 423-437.
Sopaoglu, Ugur, and Osman Abul. "A utility based approach for data stream anonymization." Journal of Intelligent Information Systems 54, no. 3 (2020): 605-631.
Otgonbayar, Ankhbayar, Zeeshan Pervez, and Keshav Dahal. "$ X-BAND $: Expiration Band for Anonymizing Varied Data Streams." IEEE Internet of Things Journal 7, no. 2 (2019): 1438-1450.
Sopaoglu, Ugur, and Osman Abul. "Classification utility aware data stream anonymization." Applied Soft Computing 110 (2021): 107743.
Joo, Yongwan, and Soonseok Kim. "SUHDSA: Secure, Useful, and High-Performance Data Stream Anonymization." IEEE Transactions on Knowledge and Data Engineering (2024).
Patil, Rahul A., and Pramod D. Patil. "Efficient approximation and privacy preservation algorithms for real time online evolving data streams." World Wide Web 27, no. 1 (2024): 5.
Wang, Jinyan, Kai Du, Xudong Luo, and Xianxian Li. "Two privacy-preserving approaches for data publishing with identity reservation." Knowledge and Information Systems 60, no. 2 (2019): 1039-1080.
Edmunds, E., S. Muthukrishnan, Subarna Sadhukhan, and Shinjiro Sueda. "MoDB: database system for synthesizing human motion." In 21st International Conference on Data Engineering (ICDE'05), IEEE, 2005, 1131-1132.
Otgonbayar, Ankhbayar, Zeeshan Pervez, and Keshav Dahal. "Toward anonymizing iot data streams via partitioning." In 2016 IEEE 13th International conference on mobile ad hoc and sensor systems (MASS), IEEE, 2016, 331-336.
Sweeney, Latanya. "Guaranteeing anonymity when sharing medical data, the datafly system." In Proceedings of the AMIA Annual Fall Symposium, p. 51. 1997.
Hundepool, Anco, and L. C. R. J. Willenborg. "µ-and τ-argus: Software for statistical disclosure control." In Third international seminar on statistical confidentiality. 1996.
Yuan, Linlin, Tiantian Zhang, Yuling Chen, Yuxiang Yang, and Huang Li. "An Innovative k-Anonymity Privacy-Preserving Algorithm to Improve Data Availability in the Context of Big Data." Computers, Materials & Continua 79, no. 1 (2024).
Shamsinejad, Elham, Touraj Banirostam, Mir Mohsen Pedram, and Amir Masoud Rahmani. "Anonymizing big data streams using in-memory processing: A novel model based on one-time clustering." Journal of Signal Processing Systems 96, no. 6 (2024): 333-356.
Comaniciu, Dorin, and Peter Meer. "Mean shift: A robust approach toward feature space analysis." IEEE Transactions on pattern analysis and machine intelligence 24, no. 5 (2002): 603-619.
Ester, Martin, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. "A density-based algorithm for discovering clusters in large spatial databases with noise." In kdd, vol. 96, no. 34, 1996, 226-231.
