Volume - 7 | Issue - 3 | september 2025
Published
29 August, 2025
For enhancing public safety, a surveillance system is essential. Specifically, video surveillance is the most popular way to maintain safety in public and private areas. The detection and recognition of abnormal activity is difficult due to a complex environment, video quality, and varying noise levels. Addressing the challenges of accuracy and video processing, the proposed study uses a cross-attention network with feature fusion to improve the recognition of abnormal activity in complex scenarios. Cross-attention helps to capture contextual information from different videos. The proposed model combines an innovative method of cross attention and feed-forward attention with latent space representation-based fusion, aiming to improve accuracy. The simulation of the study uses two benchmark datasets, UCF and UCSD and achieves remarkable performance with 97.1 % and 91.31 % accuracy. A simulation study has also demonstrated a comparative analysis with different convolution and attention networks for anomaly detection. This study proposes an effective video processing scheme with wide practical potential. The study also provides a new perspective and methodological basis for future research and applications in related fields.
KeywordsAnomaly Detection Computer Vision Video Surveillance Multimodal Learning Attention Network Feature Fusion