Abstract
The research explores the integration of generative AI in multimedia content production using a fine-tuned Llama 2 model for text generation and the Stable Diffusion algorithm for image synthesis. The research analyses the fine-tuned Llama 2-7b-chat model's adaptability to specific content generation contexts, enhanced by a unique dataset and QLoRa, a Quantized Low-Rank Adaptation for parameter-efficient fine-tuning, achieving significant reductions in training loss and nuanced quality in the generated content. Notably, the model's evaluation yielded an impressive perplexity score of 1.49, indicating advanced predictive performance. Additionally, stable diffusion's ability to transform textual descriptions into intricate images, highlighting its potential in AI-mediated content creation is demonstrated. The experiments and qualitative analyses reveal improvements in efficiency and creativity, emphasizing the collaborative potential of these models to revolutionize multidisciplinary content generation. The research underscores the transformative impact of fine-tuned generative models on content creation and offers insights into the broader implications for future AI research, while acknowledging the critical need for ethical considerations in the deployment of such technologies.
References
Ashish Vaswani, Noam Shazeer, Niki Parmar,et al.”Attention Is All You Need”. 31st International Conference on Neural Information Processing Systems(NeurIPS), no. 07 (2023): 6000–6010.
Touvron, Hugo, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov et al. "Llama 2: Open foundation and fine-tuned chat models." arXiv preprint arXiv:2307.09288 (2023).
Lingling Xu, Haoran Xie, Si-Zhao Joe Qin, et al.” Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment”. Nature Machine Intelligence, no. 05 (2023): 220-235.
Hu, Edward J., Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. "Lora: Low-rank adaptation of large language models." arXiv preprint arXiv:2106.09685 (2021).
Dettmers, Tim, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. "Qlora: Efficient finetuning of quantized llms." Advances in Neural Information Processing Systems 36 (2024).
Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in neural information processing systems 33 (2020): 6840-6851.
Robin Rombach, Andreas Blattmann, Dominik Lorenz, et al.” High-Resolution Image Synthesis with Latent Diffusion Models”.IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), no. 02 (2022): 10674-10685
Ning Ding, Yujia Qin, Guang Yang, et al.” Parameter-efficient fine-tuning of large-scale pre-trained language models”. Machine Intelligence, no. 05 (2023): 220-235.
Lermen, Simon, Charlie Rogers-Smith, and Jeffrey Ladish. "Lora fine-tuning efficiently undoes safety training in llama 2-chat 70b." arXiv preprint arXiv:2310.20624 (2023).
Pavlyshenko, Bohdan M. "Financial News Analytics Using Fine-Tuned Llama 2 GPT Model." arXiv preprint arXiv:2308.13032 (2023).1-14
Basile, Pierpaolo, Elio Musacchio, Marco Polignano, Lucia Siciliani, Giuseppe Fiameni, and Giovanni Semeraro. "LLaMAntino: LLaMA 2 models for effective text generation in Italian language." arXiv preprint arXiv:2312.09993 (2023).
Balachandran, Abhinand. "Tamil-Llama: A New Tamil Language Model Based on Llama 2." arXiv preprint arXiv:2311.05845 (2023).1-19
Pathak, Avik, Om Shree, Mallika Agarwal, Shek Diya Sarkar, and Anupam Tiwary. "Performance Analysis of LoRA Finetuning Llama-2." In 2023 7th International Conference on Electronics, Materials Engineering & Nano-Technology (IEMENTech), pp. 1-4. IEEE, 2023.
Bian, Junyi, Xiaolei Qin, Wuhe Zou, Mengzuo Huang, and Weidong Zhang. "Hellama: Llama-based table to text generation by highlighting the important evidence." arXiv preprint arXiv:2311.08896 (2023).
Xue, Zeyue, Guanglu Song, Qiushan Guo, Boxiao Liu, Zhuofan Zong, Yu Liu, and Ping Luo. "Raphael: Text-to-image generation via large mixture of diffusion paths." Advances in Neural Information Processing Systems 36 (2024).
Everaert, Martin Nicolas, Marco Bocchio, Sami Arpa, Sabine Süsstrunk, and Radhakrishna Achanta. "Diffusion in style." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2251-2261. 2023.
Tang, Raphael, Linqing Liu, Akshat Pandey, Zhiying Jiang, Gefei Yang, Karun Kumar, Pontus Stenetorp, Jimmy Lin, and Ferhan Ture. "What the daam: Interpreting stable diffusion using cross attention." arXiv preprint arXiv:2210.04885 (2022).
Zhan, Guanqi, Chuanxia Zheng, Weidi Xie, and Andrew Zisserman. "What Does Stable Diffusion Know about the 3D Scene?." arXiv preprint arXiv:2310.06836 (2023).
Stöckl, Andreas. "Evaluating a synthetic image dataset generated with stable diffusion." In International Congress on Information and Communication Technology, pp. 805-818. Singapore: Springer Nature Singapore, 2023. 805–818.
Sarafianos, Nikolaos, Xiang Xu, and Ioannis A. Kakadiaris. "Adversarial representation learning for text-to-image matching." In Proceedings of the IEEE/CVF international conference on computer vision, pp. 5814-5824. 2019.
Croitoru, Florinel-Alin, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. "Reverse Stable Diffusion: What prompt was used to generate this image?." arXiv preprint arXiv:2308.01472 (2023).
https://medium.com/polo-club-of-data-science/stable-diffusion-explained-for-everyone-77b53f4f1c4
