Hybrid CNN-Transformer for Knee Osteoarthritis Severity Grading
view PDF
view PDF

How to Cite

Patel, Unnati, Sanskruti Patel, Dharmendra Patel, Niky Jain, Suchita Patel, and Ronesh Gangavani. 2026. “Hybrid CNN-Transformer for Knee Osteoarthritis Severity Grading”. Journal of Innovative Image Processing 8 (2): 617-43. https://doi.org/10.36548/jiip.2026.2.010.

Keywords

Computer-Aided Diagnosis
Deep Learning
Hybrid CNN-Transformer Architecture
Knee Osteoarthritis
KL Grading
Medical Image Analysis
X-ray Radiography

Abstract

The problem of accurately diagnosing of the degree of severity of knee osteoarthritis (KOA) based on simple radiographic images lies in the difficulty of distinguishing between adjacent degrees and differences in imaging conditions, nad in the ordinal nature of the Kellgren-Lawrence (KL) scoring system. In this paper, we use plain radiography (X-ray anteroposterior knee radiographs) as the main type of imaging for KOA analysis. To solve these problems, we introduce an innovative hybrid architecture based on CNNs and transformers with adaptive feature integration and ordinal-aware dual-head learning for KOA degree of severity diagnosis. The novel architecture incorporates a CBAM-ResNeXt-50 model as the backbone network for texture extraction, along with a lightweight transformer-based encoder for modeling the whole anatomy structure. We effectively integrate local and global semantics by designing a learnable adaptive feature fusion module at the image level, producing stage-aware attention on different KOA degrees. Furthermore, we develop an ordinal-aware dual-head learning paradigm that can jointly conduct KL grade classification and continuous KOA severity regression tasks. Experimental results achieve 96.84% accuracy, 0.96 macro F1-score, 0.21 MAE, and 0.959 macro AUC-ROC with fewer adjacent-grade confusions.

References

  1. Goswami, Agam Das. "Automatic Classification of the Severity of Knee Osteoarthritis Using Enhanced Image Sharpening and CNN." Applied Sciences 13, no. 3 (2023): 1658.
  2. Khalid, Ahmed, Ebrahim Mohammed Senan, Khalil Al-Wagih, Mamoun Mohammad Ali Al-Azzam, and Ziad Mohammad Alkhraisha. "Hybrid Techniques of X-Ray Analysis to Predict Knee Osteoarthritis Grades Based on Fusion Features of CNN and Handcrafted." Diagnostics 13, no. 9 (2023): 1609.
  3. Pi, Sun-Woo, Byoung-Dai Lee, Mu Sook Lee, and Hae Jeong Lee. "Ensemble Deep-Learning Networks for Automated Osteoarthritis Grading in Knee X-ray Images." Scientific Reports 13, no. 1 (2023): 22887.
  4. Alavanthar, Logeshwari, Jayashree Stalin, and K. Jasmine Mystica. "Deep Learning-Based Framework for Automated Classification of Knee Osteoarthritis Severity and Detection of Joint Space Width in X-Ray Imaging." In International Conference on Sustainability Innovation in Computing and Engineering (ICSICE 2024), Atlantis Press, 2025, 1152-1161.
  5. SERIR, Amina, Lynda Bounif, Rania Lounaci, and Yamina Mezerna. "Deep Learning Framework for Assessing Knee Osteoarthritis Severity." In 2025 2nd International conference on Advances in Electronics, Control and Communication Systems (ICAECCS), IEEE, 2025, 1-6.
  6. Tiulpin, Aleksei, and Simo Saarakkala. "Automatic Grading of Individual Knee Osteoarthritis Features in Plain Radiographs Using Deep Convolutional Neural Networks." Diagnostics 10, no. 11 (2020): 932.
  7. Pan, Jian, Yuangang Wu, Zhenchao Tang, Kaibo Sun, Mingyang Li, Jiayu Sun, Jiangang Liu, Jie Tian, and Bin Shen. "Automatic Knee Osteoarthritis Severity Grading Based on X-ray Images Using a Hierarchical Classification Method." Arthritis research & therapy 26, no. 1 (2024): 203.
  8. Almusa, Lubna Mohammad, Turky Nayef Alotaiby, Hanan Saeed Murayshid, and Rawad Awad Alqahtani. "Hybrid Ensemble Model for Knee Osteoarthritis Grading: Integrating CNNs with GLCM Features and XAI." Diagnostics 16, no. 4 (2026): 539.
  9. Swapna, Munnangi, Mohammad Omar Sabri, Sugunakar Mamidala, SNaveen Kumar, and Chandi Priya KG. "Leveraging Deep Learning Methodology to Automate Knee Osteoarthritis Identification based on X-ray Images." In 2025 International Conference on Recent Innovation in Science Engineering and Technology (ICRISET), IEEE, 2025, 1-9.
  10. VR, Gokul Thamp, and T. Anjali. "Deep Learning and XAI for Knee Osteoarthritis Detection on X-Rays." In 2025 6th International Conference on Inventive Research in Computing Applications (ICIRCA), IEEE, 2025, 1925-1931.
  11. Chen, Jieneng, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L. Yuille, and Yuyin Zhou. "Transunet: Transformers Make Strong Encoders for Medical Image Segmentation." arXiv preprint arXiv:2102.04306 (2021).
  12. Antony, Joseph, Kevin McGuinness, Noel E. O'Connor, and Kieran Moran. "Quantifying Radiographic Knee Osteoarthritis Severity Using Deep Convolutional Neural Networks." In 2016 23rd international conference on pattern recognition (ICPR), IEEE, 2016, 1195-1200.
  13. Sekhri, Aymen, Mohamed A. Kerkouri, Aladine Chetouani, Marouane Tliba, Yassine Nasser, Rachid Jennane, and Alessandro Bruno. "Automatic Diagnosis of Knee Osteoarthritis Severity Using Swin Transformer." In Proceedings of the 20th International Conference on Content-Based Multimedia Indexing, 2023, 41-47.
  14. Mohammed, Abdul Sami, Ahmed Abul Hasanaath, Ghazanfar Latif, and Abul Bashar. "Knee Osteoarthritis Detection and Severity Classification Using Residual Neural Networks on Preprocessed X-ray Images." Diagnostics 13, no. 8 (2023): 1380.
  15. Malik, Sanjeev, and Nikita Singhal. "A Comparative Analysis of Deep Learning Approaches for Knee Osteoarthritis Detection Using Indian and Multi-Centric Datasets." In 2025 International Conference on Ambient Intelligence in Health Care (ICAIHC), IEEE, 2025, 1-6.
  16. Srivastava, Sameer, Eshanee Ghosh, Abhinav Kumar, Parthiv Chahar, Arpit Utkarsh, and Raghavendra Mishra. "Multi-Class Deep Learning Architecture for COVID-19, Tuberculosis, and Pneumonia Classification Using Chest X-ray Images." Journal of Medical Imaging and Radiation Sciences 56, no. 6 (2025): 102115.
  17. Qu W, Balki I, Mendez M, Valen J, Levman J, Tyrrell PN. Assessing and Mitigating the Effects of Class Imbalance in Machine Learning with Application to X-ray Imaging. Int J Comput Assist Radiol Surg. 2020 Dec;15(12):2041-2048. doi: 10.1007/s11548-020-02260-6. Epub 2020 Sep 23. PMID: 32965624.
  18. Momenpour, Thomures, and Arafat Abu Mallouh. "Optimizing CNN-Based Diagnosis of Knee Osteoarthritis: Enhancing Model Accuracy with CleanLab Relabeling." Diagnostics 15, no. 11 (2025): 1332.
  19. Djoumessi, Kerol, Samuel Ofosu Mensah, and Philipp Berens. "A Hybrid Fully Convolutional CNN-Transformer Model for Inherently Interpretable Disease Detection from Retinal Fundus Images." In International Workshop on Interpretability of Machine Intelligence in Medical Image Computing, Cham: Springer Nature Switzerland, 2025, 106-116.
  20. Wang, Jason, and Luis Perez. "The Effectiveness of Data Augmentation in Image Classification Using Deep Learning." Convolutional Neural Networks Vis. Recognit 11, no. 2017 (2017): 1-8.
  21. Tliba, Marouane, Yassine Nasser, Mohamed Amine Kerkouri, Aladine Chetoauni, and Rachid Jennane. "A Graph-Driven Approach to Knee Osteoarthritis Severity Classification." In 2025 33rd European Signal Processing Conference (EUSIPCO), IEEE, 2025, 1592-1596.
  22. Kuriyama, Yuya, Mitsuhiro Nakamura, and Megumi Nakao. "Data Augmentation Using the Hierarchical Encoding of Deformation Fields Between CT Images." IEEE Transactions on Radiation and Plasma Medical Sciences 8, no. 8 (2024): 939-949.
  23. Tschandl, Philipp, Christoph Rinner, Zoe Apalla, Giuseppe Argenziano, Noel Codella, Allan Halpern, Monika Janda et al. "Human–Computer Collaboration for Skin Cancer Recognition." Nature medicine 26, no. 8 (2020): 1229-1234.
  24. Moilanen, M., Grönholm, T., Paloneva, J., & Äyrämö, S. (2024). Tibial Spiking Knee OA Dataset (Version 1) [Data set]. Mendeley Data. https://doi.org/10.17632/6gbptmgp3y.1.
  25. McHugh, Mary L. "Interrater Reliability: The Kappa Statistic." Biochemia medica 22, no. 3 (2012): 276-282.
  26. Tajbakhsh, Nima, Jae Y. Shin, Suryakanth R. Gurudu, R. Todd Hurst, Christopher B. Kendall, Michael B. Gotway, and Jianming Liang. "Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?." IEEE transactions on medical imaging 35, no. 5 (2016): 1299-1312.
  27. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is All You Need." Advances in neural information processing systems 30 (2017).
  28. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, 770-778.
  29. Xie, Saining, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. "Aggregated Residual Transformations for Deep Neural Networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, 1492-1500.
  30. Usama, Mohd, Emma Nyman, Ulf Näslund, and Christer Grönlund. "A Domain Adaptation Model for Carotid Ultrasound: Image Harmonization, Noise Reduction, and Impact on Cardiovascular Risk Markers." Computers in Biology and Medicine 190 (2025): 110030.
  31. Simonyan, Karen, and Andrew Zisserman. "Very Deep Convolutional Networks for Large-Scale Image Recognition." arXiv preprint arXiv:1409.1556 (2014).
  32. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, 770-778.
  33. Huang, Gao, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. "Densely Connected Convolutional Networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, 4700-4708.
  34. Tan, Mingxing, and Quoc Le. "Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks." In International conference on machine learning, PMLR, 2019, 6105-6114.
  35. Tan, Mingxing, and Quoc Le. "Efficientnetv2: Smaller Models and Faster Training." In International conference on machine learning, PMLR, 2021, 10096-10106.
  36. Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani et al. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." arXiv preprint arXiv:2010.11929 (2020).
  37. Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows." In Proceedings of the IEEE/CVF international conference on computer vision, 2021, 10012-10022.
  38. Liu, Zhuang, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. "A Convnet for the 2020s." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, 11976-11986.
  39. Sekhri, Aymen, Mohamed A. Kerkouri, Aladine Chetouani, Marouane Tliba, Yassine Nasser, Rachid Jennane, and Alessandro Bruno. "Automatic Diagnosis of Knee Osteoarthritis Severity Using Swin Transformer." In Proceedings of the 20th International Conference on Content-Based Multimedia Indexing, 2023, 41-47.
  40. Islam, N. (2023). Knee Osteoarthritis Grad-CAM [Computer Software]. Kaggle. https://www.kaggle.com/code/naim99/knee-osteoarthritis-grad-cam