A Deep Learning Framework for Kannada-English Text Recognition and Language Identification in Natural Scene Images
PDF
PDF

How to Cite

Hangarage, Venkata B, and Gururaj Mukarambi. 2025. “A Deep Learning Framework for Kannada-English Text Recognition and Language Identification in Natural Scene Images”. Journal of Innovative Image Processing 7 (3): 976-90. https://doi.org/10.36548/jiip.2025.3.021.

Keywords

  • YOLOv5
  • SPPF
  • Deep Learning
  • Computer Vision
  • Image Processing

Abstract

Natural Scene Text Detection and Language Identification is a challenging problem in the field of computer vision, due to autonomous video surveillance and the design of an OCR system for natural scene images. The drawback of an autonomous video surveillance and monolingual OCR system is that it will not work efficiently on natural scene images, where text appears in different orientations, backgrounds, and lighting conditions with multilingual scripts. Hence, we proposed a deep learning model, i.e. fine-tuned YOLOv5, for text detection and language identification in bilingual scene images. For testing the proposed (fine-tuned) model, there is no standard ground truth database in the literature. Therefore, we created our own real-time natural scene dataset from the Kalaburagi and Bidar districts in the state of Karnataka. The proposed (fine-tuned) model involves training YOLOv5 on a real-time dataset, and it works with a genetic approach. It produces the anchor boxes for the objects present in the natural scene image. To test the performance of the fine-tuned YOLOv5 model, we employed evaluation metrics like precision, recall and accuracy. The experimental setup demonstrates robustness of the fine-tuned YOLOv5 model for text detection and language identification. We obtained an optimized precision rate of 86.8%, a recall rate of 83.4%, an F1 score of 85%, and an accuracy of 94.4%. The training of 80% and testing of 20% was carried out in the experiment. A comparative analysis of the fine-tuned YOLOv5 model with existing methods found in the literature is carried out, and observed that the fine-tuned YOLOv5 model shows better performance. The novelty of the paper is that the fine-tuned YOLOv5 model and dataset were constrained with a mixture of low-resolution and complex background images.

References

Khan, Nisar, Riaz Ahmad, Khalil Ullah, Siraj Muhammad, Ibrar Hussain, Ahmad Khan, Yazeed Yasin Ghadi, and Heba G. Mohamed. "Robust arabic and pashto text detection in camera-captured documents using deep learning techniques." IEEE Access 11 (2023): 135788-135796.

Gomez, Lluis, and Dimosthenis Karatzas. "A fine-grained approach to scene text script identification." In 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, 2016, 192-197.

Maheshwari, Karan, Alex Noel Joseph Raj, Vijayalakshmi GV Mahesh, Zhemin Zhuang, Elizabeth Rufus, Palaiahnakote Shivakumara, and Ganesh R. Naik. "Bilingual text detection in natural scene images using invariant moments." Journal of Intelligent & Fuzzy Systems 37, no. 5 (2019): 6773-6784.

Joseph Raj, Alex Noel, Chen Junmin, Ruban Nersisson, Vijayalakshmi GV Mahesh, and Zhemin Zhuang. "Bilingual text detection from natural scene images using faster R-CNN and extended histogram of oriented gradients." Pattern Analysis and Applications 25, no. 4 (2022): 1001-1013.

Chandio, Asghar Ali, Md Asikuzzaman, Mark Pickering, and Mehwish Leghari. "Cursive-text: a comprehensive dataset for end-to-end Urdu text recognition in natural scene images." Data in brief 31 (2020): 105749.

Albalawi, Bayan M., Amani T. Jamal, Lama A. Al Khuzayem, and Olaa A. Alsaedi. "An End-to-End Scene Text Recognition for Bilingual Text." Big Data and Cognitive Computing 8, no. 9 (2024): 117.

Arafat, Syed Yasser, and Muhammad Javed Iqbal. "Urdu-text detection and recognition in natural scene images using deep learning." IEEE Access 8 (2020): 96787-96803.

Chakraborty, Neelotpal, Agneet Chatterjee, Pawan Kumar Singh, Ayatullah Faruk Mollah, and Ram Sarkar. "Application of daisy descriptor for language identification in the wild." Multimedia Tools and Applications 80, no. 1 (2021): 323-344.

Khalil, Ashwaq, Moath Jarrah, Mahmoud Al-Ayyoub, and Yaser Jararweh. "Text detection and script identification in natural scene images using deep learning." Computers & Electrical Engineering 91 (2021): 107043.

Zhang, Zhiyun, Hornisa Mamat, Xuebin Xu, Alimjan Aysa, and Kurban Ubul. "FAS-Res2net: An improved res2net-based script identification method for natural scenes." Applied Sciences 13, no. 7 (2023): 4434.

Hangarage, Venkata, and Gururaj Mukarambi. "Text Localization and Enhancement of Mobile Camera based Complex Natural Bilingual Text Scene Images." Procedia Computer Science 235 (2024): 2353-2361.

Dhandra, B. V., Satishkumar Mallappa, and Gururaj Mukarambi. "Script identification of camera based bilingual document images using SFTA features." International Journal of Technology and Human Interaction (IJTHI) 15, no. 4 (2019): 1-12.

Gomez, Lluis. (2016). MLe2e multi-lingual end-to-end dataset.

de Campos, T. E., Babu, B. R., Varma, M. Character recognition in natural images. International conference on computer vision theory and applications(2009),1, SCITEPRESS, 273-280.

Makesense AI. (n.d.). Retrieved from https://www.makesense.ai/