Image Captioning Generator and Comparison Study
PDF
PDF

How to Cite

Thirumahal, R., Harshitha Prabakaran, G. N. Swetha, S. Sushmitha, S. Swathi, and Chandhini Balasubramaniam. 2023. “Image Captioning Generator and Comparison Study”. Journal of Innovative Image Processing 4 (4): 328-37. https://doi.org/10.36548/jiip.2022.4.009.

Keywords

  • Image retrieval
  • caption generator
  • artificial intelligence
  • generative adversarial network
  • image analysis

Abstract

Caption generation has long been of interest to researchers in the field of artificial intelligence. The ability to train a system to properly represent an image or environment, has broad applications in robotic vision, management, and many other areas. The purpose of this study is to analyze multiple transmission learning strategies and create a unique system for improving caption accuracy. To increase object relevance, image feature vectors are constructed using multiple state-of-the-art models that are input into an encoder/decoder transformation network based on attentional mechanisms. The model is evaluated for comparing datasets such as MS-COCO with criteria such as Bilingual Evaluation Understudy.

References

R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, και A. Y. Ng, ‘Grounded Compositional Semantics for Finding and Describing Images with Sentences’, Transactions of the Association for Computational Linguistics, τ. 2, σσ. 207–218, 04 2014.

M. Barraco, M. Stefanini, M. Cornia, and S. Cascianell, “CaMEL: Mean Teacher Learning for Image Captioning,” paperswithcode.com, Feb. 21, 2022.

Xu K, Ba J, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention[C] ,International Conference on Machine Learning. 2015: 2048-2057.

Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4651–4659, 2016.

Z. Yang, Y. Yuan, Y. Wu, W. W. Cohen, and R. R. Salakhutdinov. Review networks for caption generation. In Advances in Neural Information Processing Systems, pages 2361–2369, 2016.

T. Yao, Y. Pan, Y. Li, and T. Mei. Exploring visual relationship for image captioning. In European Conference on Computer Vision, pages 684–699, 2018.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 1:8, 2019.

P. Sharma, N. Ding, S. Goodman, and R. Soricut. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2556–2565, Melbourne, Australia, July 2018. Association for Computational Linguistics.

Mao J, Xu W, Yang Y, et al. Deep captioning with multimodal recurrent neural networks (m-rnn)[J]. arXiv preprint arXiv:1412.6632, 2014.

Jia X, Gavves E, Fernando B, et al. Guiding the long-short term memory model for image caption generation[C],Proceedings of the IEEE International Conference on Computer Vi-sion. 2015: 2407-2415.

Y. Chu, X. Yue, L. Yu, M. Sergei, and Z. Wang, “Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention,” Hindawi.com, Oct. 21, 2020.

S. Herdade, A. Kappeler, K. Boakye, and J. Soares, “Image Captioning: Transforming Objects into Words,” Jul. 21, 2019.

G. Sharma, P. Kalena, N. Malde, and A. Nair, “Visual Image Caption Generator Using Deep Learning,” SSRN Electronic Journal, Jan. 2019, doi: 10.2139/ssrn.3368837.

M.Cornia, M. Stefanini, L. Baraldi, και R. Cucchiara, ‘Meshed-Memory Transformer for Image Captioning’. arXiv, 2019.

Z. Zohourianshahzadi and J. K. Kalita, “Neural attention for image captioning: review of outstanding methods,” Artificial Intelligence Review, Nov. 2021, doi: 10.1007/s10462-021-10092-2.

Muhammad Abdelhadie Al-Malla, Nada Ghneim, and Assef Jafar, “Image captioning model using attention and object features to mimic human image understanding,” Feb. 14, 2022.

MindSpore, “Image Classification Using ResNet-50 Network,”Aug.2019. https://www.mindspore.cn/tutorial/training/en/r1.1/advanced_use/cv_resnet50.html

Chenliang Li,Haiyang Xu, and Junfeng Tian, “mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections,” Papers with code, May 24, 2022.

S.Vinodababu,“a-PyTorch-Tutorial-to-Object-Detection,”github.com,Aug.08,2020.