Abstract
Unit testing plays a crucial role in application software development by validating module functionality in isolation before system integration. Manually writing and reviewing unit test cases is time-consuming and defect-prone. Complex logic and boundary conditions are not tested thoroughly, leading to higher rework costs. Automated test generation using Large Language Models (LLMs) reduces development effort but faces challenges such as ensuring meaningful test coverage, handling invalid inputs, and addressing missing imports. This study aims to leverage LLMs in combination with the Autogen Agentic AI framework to generate high-quality Python unit tests by effectively prompting them, fixing failed test cases, validating them through test execution, analyzing results, and improving code coverage and mutation score. For experiments conducted on the Insurance Management Application, branch coverage improved from 98% to 99%, and the mutation score improved from 83.9% to 95.8%. The proposed approach significantly reduces manual effort while improving test suite effectiveness and software quality.
References
- Jain, Kush, and Claire Le Goues. "TestForge: Feedback-Driven, Agentic Test Suite Generation." arXiv preprint arXiv:2503.14713 (2025).
- Dakhel, Arghavan Moradi, Amin Nikanjam, Vahid Majdinasab, Foutse Khomh, and Michel C. Desmarais. "Effective Test Generation using Pre-Trained Large Language Models and Mutation Testing." Information and Software Technology 171 (2024): 107468.
- Wang, Zejun, Kaibo Liu, Ge Li, and Zhi Jin. "Hits: High-Coverage LLM-Based Unit Test Generation via Method Slicing." In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, pp. 1258-1268. 2024.
- Zhang, Zhe, Xingyu Liu, Yuanzhang Lin, Xiang Gao, Hailong Sun, and Yuan Yuan. "LLM-Based Unit Test Generation via Property Retrieval." arXiv preprint arXiv:2410.13542 (2024).
- Alshahwan, Nadia, Jubin Chheda, Anastasia Finogenova, Beliz Gokkaya, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, and Eddy Wang. "Automated Unit Test Improvement using Large Language Models at Meta." In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, 185-196. 2024.
- Chen, Yinghao, Zehao Hu, Chen Zhi, Junxiao Han, Shuiguang Deng, and Jianwei Yin. "Chatunitest: A Framework for LLM-Based Test Generation." In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, 572-576. 2024.
- Pizzorno, Juan Altmayer, and Emery D. Berger. "Coverup: Coverage-Guided LLM-Based Test Generation." arXiv preprint arXiv:2403.16218 5 (2024).
- Gu, Siqi, Quanjun Zhang, Kecheng Li, Chunrong Fang, Fangyuan Tian, Liuchuan Zhu, Jianyi Zhou, and Zhenyu Chen. "Testart: Improving LLM-Based Unit Testing via Co-Evolution of Automated Generation and Repair Iteration." arXiv preprint arXiv:2408.03095 (2024).
- Ryan, Gabriel, Siddhartha Jain, Mingyue Shang, Shiqi Wang, Xiaofei Ma, Murali Krishna Ramanathan, and Baishakhi Ray. "Code-Aware Prompting: A Study of Coverage-Guided Test Generation in Regression Setting using LLM." Proceedings of the ACM on Software Engineering 1, no. FSE (2024): 951-971.
- Storhaug, André, and Jingyue Li. "Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study." arXiv preprint arXiv:2411.02462 (2024).
- Pan, Rangeet, Myeongsoo Kim, Rahul Krishna, Raju Pavuluri, and Saurabh Sinha. "Multi-Language Unit Test Generation using LLMs." arXiv e-prints (2024): arXiv-2409.
- Zhang, Yuwei, Qingyuan Lu, Kai Liu, Wensheng Dou, Jiaxin Zhu, Li Qian, Chunxi Zhang, Zheng Lin, and Jun Wei. "Citywalk: Enhancing LLm-Based C++ Unit Test Generation via Project-Dependency Awareness and Language-Specific Knowledge." ACM Transactions on Software Engineering and Methodology (2025).
- Bhatia, Shreya, Tarushi Gandhi, Dhruv Kumar, and Pankaj Jalote. "Unit Test Generation using Generative AI: A Comparative Performance Analysis of Auto Generation Tools." In Proceedings of the 1st International Workshop on Large Language Models for Code, 54-61. 2024.
- Zhong, Zhiyuan, Sinan Wang, Hailong Wang, Shaojin Wen, Hao Guan, Yida Tao, and Yepang Liu. "Advancing Bug Detection in Fastjson2 with Large Language Models Driven Unit Test Generation." arXiv preprint arXiv:2410.09414 (2024).
- Bayrı, Vahit, and Ece Demirel. "AI-Powered Software Testing: The Impact of Large Language Models on Testing Methodologies." In 2023 4th International Informatics and Software Engineering Conference (IISEC), 1-4. IEEE, 2023.
