Fuel Sales Forecasting with SARIMA-GARCH and Rolling Window
Volume-5 | Issue-3

An Accurate Bitcoin Price Prediction using logistic regression with LSTM Machine Learning model
Volume-3 | Issue-3

Nepali Image Captioning: Generating Coherent Paragraph-Length Descriptions Using Transformer
Volume-6 | Issue-1

A Comprehensive Review on Advanced Driver Assistance System
Volume-4 | Issue-2

A Novel Approach based on PSO and Coloured Petri Net for improving Services in the Emergency Department
Volume-5 | Issue-1

Credit Risk Analysis using Explainable Artificial Intelligence
Volume-6 | Issue-3

Implications of Tokenizers in BERT Model for Low-Resource Indian Language
Volume-4 | Issue-4

Design of Distribution Transformer Health Management System using IoT Sensors
Volume-3 | Issue-3

Cloud Load Estimation with Deep Logarithmic Network for Workload and Time Series Optimization
Volume-3 | Issue-3

Energy Management System in the Vehicles using Three Level Neuro Fuzzy Logic
Volume-3 | Issue-3

An Integrated Approach for Crop Production Analysis from Geographic Information System Data using SqueezeNet
Volume-3 | Issue-4

An Accurate Bitcoin Price Prediction using logistic regression with LSTM Machine Learning model
Volume-3 | Issue-3

Design of Distribution Transformer Health Management System using IoT Sensors
Volume-3 | Issue-3

Design of a Food Recommendation System using ADNet algorithm on a Hybrid Data Mining Process
Volume-3 | Issue-4

Automatic Diagnosis of Alzheimer’s disease using Hybrid Model and CNN
Volume-3 | Issue-4

Effective Prediction of Online Reviews for Improvement of Customer Recommendation Services by Hybrid Classification Approach
Volume-3 | Issue-4

Acoustic Features Based Emotional Speech Signal Categorization by Advanced Linear Discriminator Analysis
Volume-3 | Issue-4

Analysis of Statistical Trends of Future Air Pollutants for Accurate Prediction
Volume-3 | Issue-4

Identification of Electricity Threat and Performance Analysis using LSTM and RUSBoost Methodology
Volume-3 | Issue-4

Review on Data Securing Techniques for Internet of Medical Things
Volume-3 | Issue-3

Home / Archives / Volume-6 / Issue-1 / Article-6

Volume - 6 | Issue - 1 | march 2024

Nepali Image Captioning: Generating Coherent Paragraph-Length Descriptions Using Transformer Open Access
Nabaraj Subedi  , Nirajan Paudel, Manish Chhetri, Sudarshan Acharya, Nabin Lamichhane  652
Pages: 70-84
Cite this article
Subedi, Nabaraj, Nirajan Paudel, Manish Chhetri, Sudarshan Acharya, and Nabin Lamichhane. "Nepali Image Captioning: Generating Coherent Paragraph-Length Descriptions Using Transformer." Journal of Soft Computing Paradigm 6, no. 1 (2024): 70-84
Published
30 April, 2024
Abstract

The advent of deep neural networks has made the image captioning task more feasible. It is a method of generating text by analyzing the different parts of an image. A lot of tasks related to this have been done in the English language, while very little effort is put into this task in other languages, particularly the Nepali language. It is an even harder task to carry out research in the Nepali language because of its difficult grammatical structure and vast language domain. Further, the little work done in the Nepali language is done to generate only a single sentence, but the proposed work emphasizes generating paragraph-long coherent sentences. The Stanford human genome dataset, which was translated into Nepali language using the Google Translate API is used in the proposed work. Along with this, a manually curated dataset consisting of 800 images of the cultural sites of Nepal, along with their Nepali captions, was also used. These two datasets were combined to train the deep learning model. The task involved working with transformer architecture. In this setup, image features were extracted using a pretrained Inception V3 model. These features were then inputted into the encoder segment after position encoding. Simultaneously, embedded tokens from captions were fed into the decoder segment. The resulting captions were assessed using BLEU scores, revealing higher accuracy and BLEU scores for the test images.

Keywords

BLEU Inception V3 Nepali Captions Transformer

×

Currently, subscription is the only source of revenue. The subscription resource covers the operating expenses such as web presence, online version, pre-press preparations, and staff wages.

To access the full PDF, please complete the payment process.

Subscription Details

Category Fee
Article Access Charge
15 USD
Open Access Fee Nil
Annual Subscription Fee
200 USD
After payment,
please send an email to irojournals.contact@gmail.com / journals@iroglobal.com requesting article access.
Subscription form: click here