Enhancing Rail Surface Defect Detection Using a Hybrid YOLOv11–Vision Transformer Framework
PDF
PDF

How to Cite

Markapudi, Baburao, Lahari Gundiga, Venkata Ramesh Babu Kondeti, Sravanthi Kanumuri, and Yamini Maddala. 2026. “Enhancing Rail Surface Defect Detection Using a Hybrid YOLOv11–Vision Transformer Framework”. Journal of Trends in Computer Science and Smart Technology 8 (2): 291-303. https://doi.org/10.36548/jtcsst.2026.2.005.

Keywords

Rail Surface Defect Detection
YOLOv11
Vision Transformer
Hybrid Detection Framework
Global Context Modelling

Abstract

Railway track surface defects cause severe challenges to the safety of humans, trains, and transported goods. Modern YOLO-based detectors efficiently detect rail surface defects in real time. However, they mainly focus on local features and also confuse defects that look similar. On the other hand, Vision Transformers are good at modeling global context using self-attention, but they do not localize objects properly when used alone. To overcome these issues, this study introduces a hybrid YOLOv11 and Vision Transformer (YOLOv11a and ViT) framework for enhanced rail surface defect detection. The approach integrates a lightweight Vision Transformer module into the YOLOv11 backbone so that the model learns both detailed local features and captures global dependencies. Experiments are conducted on a merged public railway surface defect computer vision dataset comprising 8,177 labeled images of five defect categories. The results show that the new method achieves an mAP@0.5 of 0.951 with a precision of 0.899 and recall of 0.921, outperforming the baseline YOLO model. The most significant advancements have been made in identifying elongated defects such as cracks and light bands.The framework also maintains real-time performance, making it practical for use in railway safety inspections.

References

  1. M. Babu Rao, C. Kavitha, B. Prabhakara Rao and A. Govardhan, "A New Feature Set for Content based Image Retrieval," 2013 International Conference on Information Communication and Embedded Systems (ICICES), Chennai, India, 2013, pp. 84-89.
  2. M. Babu Rao, Kavitha, C., Rao, B.P., Govardhan, A. “Content Based Image Retrieval Based on Dominant Color, Scan Pattern Co-occurrence Matrix of a Motif and Shape” In: Das, V.V., Thankachan, N. (eds) Computational Intelligence and Information Technology. CIIT. Communications in Computer and Information Science, vol 250. Springer, Berlin, Heidelberg, 2011, 353-357.
  3. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, 770-778.
  4. Lin, Tsung-Yi, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. "Feature Pyramid Networks for Object Detection." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, 2117-2125.
  5. Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. "You Only Look Once: Unified, Real-Time Object Detection." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, 779-788.
  6. G. Jocher et al., “Ultralytics YOLOv5,” 2020. [Online]. Available: https://github.com/ultralytics/yolov5
  7. Wang, Yan, Kehua Zhang, Ling Wang, and Lintong Wu. "An Improved YOLOv8 Algorithm for Rail Surface Defect Detection." IEEE Access 12 (2024): 44984-44997.
  8. Fang, Zhanao, Liming Li, Lele Peng, Shubin Zheng, Qianwen Zhong, and Ting Zhu. "Yolov8n-rsdd: A High-Performance Low-Complexity Rail Surface Defect Detection Network." IEEE Access 12 (2024): 196249-196265.
  9. Cao, Yuan, Long Ma, Yongkui Sun, Feng Wang, and Shuai Su. "Improved YOLOv8 for High-Precision Detection of Rail Surface Defects on Heavy-Haul Railways." Chinese Journal of Electronics 34, no. 3 (2025): 802-815.
  10. Allada, Apparna, Rajaram Bhavani, Kavitha Chaduvula, and Rajaram Priya. "Alzheimer's Disease Classification Using Competitive Swarm Multi‐Verse Optimizer‐Based Deep Neuro‐Fuzzy Network." Concurrency and Computation: Practice and Experience 35, no. 21 (2023): e7696.
  11. Kantapalli, Bhaskar, and Babu Rao Markapudi. "SSPO-DQN Spark: Shuffled Student Psychology Optimization Based Deep Q Network with Spark Architecture for Big Data Classification." Wireless Networks 29, no. 1 (2023): 369-385.
  12. Edupuganti, Mounika, V. Rathikarani, and Kavitha Chaduvula. "Classification of Heart Diseases Using Fusion Based Learning Approach." International Journal of Intelligent Systems and Applications in Engineering 12, no. 8s (2024): 570-580.
  13. A. Dosovitskiy et al., “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale,” New York City, 23-26 June 2021, 45-67.
  14. Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows." In Proceedings of the IEEE/CVF international conference on computer vision, 2021, 10012-10022.
  15. EngDes2, “Rail Surface Defects Computer Vision Dataset,” Roboflow Universe. [Online]. Available: https://universe.roboflow.com/engdes2/rail-surface-defects-flrty-omt9o