Volume - 7 | Issue - 4 | december 2025
Published
21 November, 2025
The aim of remote sensing image captioning (RSIC) is to obtain insightful and detailed textual description of satellite images and aerial images. However, traditional methods are not able to achieve this aim effectively due to a lack of contextual awareness caused by variations in scale, viewpoint and scene complexity. In this paper, we propose a method, the Multiscale Region-Aware Captioning Network (MSR-CapNet), which helps to achieve the aim of RSIC by generating relevant and semantically correct textual descriptions for scenes in satellite images (and aerial images). We train and test our method for the purpose of RSIC on the RSICD and UCM caption datasets. In our MSR-CapNet method, we have integrated Feature Pyramid Encoding (used for local and global visual characteristics representation), Adaptive Attention (which helps in dynamic prioritization of relevant regions) and Topic-Sensitive Embeddings (to generate semantically consistent captions). To show the effectiveness of the proposed method (MSR-CapNet), we compared it with existing techniques (recent transformer and graph-based baselines) using BLEU-4, METEOR, and CIDEr measures, where it shows consistent improvement over existing techniques.
KeywordsRemote Sensing Image Captioning (RSIC) Attention Mechanism Topic-Sensitive Word Embeddings Satellite Images