Role of Synthetic Data for Improved AI Accuracy
PDF

Keywords

AI
Synthetic Data
Privacy
Security
Bias
Fairness

How to Cite

Chaitanya, Ketha Dhana Veera, and Manas Kumar Yogi. 2023. “Role of Synthetic Data for Improved AI Accuracy”. Journal of Artificial Intelligence and Capsule Networks 5 (3): 330-45. https://doi.org/10.36548/jaicn.2023.3.008.

Abstract

Artificial Intelligence (AI) has emerged as a transformative technology across various industries, enabling advanced applications such as image recognition, natural language processing, and autonomous systems. A critical determinant of AI model performance is the quality and quantity of training data used during the model's development. However, acquiring and labeling large datasets for training can be resource-intensive, time-consuming, and privacy-sensitive. Synthetic data has emerged as a promising solution to address these challenges and enhance AI accuracy. This study explores the role of synthetic data in improving AI accuracy. Synthetic data refers to artificially generated data that mimics the distribution and characteristics of real-world data. By leveraging techniques from computer graphics, data augmentation, and generative modeling, researchers and practitioners can create diverse and representative synthetic datasets that supplement or replace traditional training data.

PDF

References

Nikolenko, Sergey I. Synthetic data for deep learning. Vol. 174. Springer Nature, 2021.

Abowd, John M., and Lars Vilhuber. "How protective are synthetic data?." International Conference on Privacy in Statistical Databases. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008.

Patki, Neha, Roy Wedge, and Kalyan Veeramachaneni. "The synthetic data vault." 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2016.

Jordon, James, et al. "Synthetic Data--what, why and how?." arXiv preprint arXiv:2205.03257 (2022).

Hu, Qixin, Alan Yuille, and Zongwei Zhou. "Synthetic Data as Validation." arXiv preprint arXiv:2310.16052 (2023).

Assefa, Samuel A., et al. "Generating synthetic data in finance: opportunities, challenges and pitfalls." Proceedings of the First ACM International Conference on AI in Finance. 2020.

Hyun, Jayun, et al. "Synthetic Data Generation System for AI-Based Diabetic Foot Diagnosis." SN Computer Science 2.5 (2021): 345.

Kurapati, Shalini, and Luca Gilli. "Synthetic data: A convergence between Innovation and GDPR." Journal of Open Access to Law 11.2 (2023): 12-12.

Gonzales, Aldren, Guruprabha Guruswamy, and Scott R. Smith. "Synthetic data in health care: a narrative review." PLOS Digital Health 2.1 (2023): e0000082

Dahmen, Jessamyn, and Diane Cook. "SynSys: A synthetic data generation system for healthcare applications." Sensors 19.5 (2019): 1181.

Giuffrè, Mauro, and Dennis L. Shung. "Harnessing the power of synthetic data in healthcare: innovation, application, and privacy." NPJ Digital Medicine 6.1 (2023): 186.

https://www.kaggle.com/datasets/jehanbhathena/weather-dataset

https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

https://data.world/cancerdatahp/lung-cancer data/workspace/file?filename=cancer+patient+data+sets.xlsx

https://archive.ics.uci.edu/dataset/613/smartphone+dataset+for+anomaly+detection+in+crowds

https://www.kaggle.com/code/badmangamingsv/credit-card-fraud-detection