Instance Segmentation of Oral Cancer Images with Fusion of Swin Transformer and Mask RCNN

Kavyashree C.; Vimala H S.

doi:10.36548/jiip.2025.3.007

Instance Segmentation of Oral Cancer Images with Fusion of Swin Transformer and Mask RCNN

Open Access

https://doi.org/10.36548/jiip.2025.3.007

Vol. 7, No. 3 (2025)

Published: 29 August, 2025

Pages: 695-706

Kavyashree C. , Kavyashree C.

Department of Computer Science & Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bangalore, India

Department of Computer Science & Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bangalore, India
Vimala H S. Vimala H S.

Department of Computer Science & Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bangalore, India

Department of Computer Science & Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bangalore, India

view PDF

How to Cite

C., Kavyashree, and Vimala H S. 2025. “Instance Segmentation of Oral Cancer Images With Fusion of Swin Transformer and Mask RCNN”. Journal of Innovative Image Processing 7 (3): 695-706. https://doi.org/10.36548/jiip.2025.3.007.

Keywords

Oral Cancer

Mask RCNN

Swin Transformer

Object Detection

Instance Segmentation

Abstract

Oral cancer is the most preventable cancer if it is diagnosed at an early stage. Artificial intelligence can be a great help in cancer detection. Deep learning architectures are predominantly useful in medical image analysis by identifying patterns and the ability to predict the insights. The study proposes a deep learning methodology using Mask RCNN (Region Based Convolutional Neural Network) for the precise detection and segmentation of oral lesions in photographic images. With the swin transformer as a backbone, it aids the model in extracting features more effectively, thus supporting precise detection. Its ability to identify relationships among different parts of an image is particularly useful in locating the smallest lesions. The precise annotation has helped generate the segmentation mask accurately. The model attains a mean average precision (mAP) of 99.5%, a precision of 92.7% and a recall of 96.6%. This exceptional performance of the model is useful for the medical community to use it as a tool for the early detection of oral cancer.

References

Borse, Vivek, Aditya Narayan Konwar, and Pronamika Buragohain. "Oral cancer diagnosis and perspectives in India." Sensors International 1 (2020): 100046.
Thiruvengadam, Rekha, and Jin Hee Kim. "Therapeutic strategy for oncovirus-mediated oral cancer: A comprehensive review." Biomedicine & Pharmacotherapy 165 (2023): 115035.
Siegel, Rebecca L., Angela N. Giaquinto, and Ahmedin Jemal. "Cancer statistics, 2024." CA: a cancer journal for clinicians 74, no. 1 (2024): 12-49.
Kavyashree, C., H. S. Vimala, and J. Shreyas. "A systematic review of artificial intelligence techniques for oral cancer detection." Healthcare Analytics 5 (2024): 100304.
Hmidani, Oussama, and EM Ismaili Alaoui. "A comprehensive survey of the R-CNN family for object detection." In 2022 5th International Conference on Advanced Communication Technologies and Networking (CommNet), IEEE, 2022, 1-6.
Jiang, Du, Gongfa Li, Chong Tan, Li Huang, Ying Sun, and Jianyi Kong. "Semantic segmentation for multiscale target based on object recognition using the improved Faster-RCNN model." Future Generation Computer Systems 123 (2021): 94-104.
Ren, Junsong, and Yi Wang. "Overview of object detection algorithms using convolutional neural networks." Journal of Computer and Communications 10, no. 1 (2022): 115-132.
Zhao, X., Xu, T., Peng, L., Li, S., Zhao, Y., Liu, H., He, J. and Liang, S., 2023. Recognition and segmentation of teeth and mandibular nerve canals in panoramic dental x-rays by mask RCNN. Displays, 78, 102447.
Guo, Yanbin, Jing Guo, Yong Li, Peng Zhang, Yuan-Di Zhao, Yundi Qiao, Benyuan Liu, and Guoping Wang. "Rapid detection of non-normal teeth on dental X-ray images using improved Mask R-CNN with attention mechanism." International Journal of Computer Assisted Radiology and Surgery 19, no. 4 (2024): 779-790.
Brahmi, Walid, and Imen Jdey. "Automatic tooth instance segmentation and identification from panoramic X-Ray images using deep CNN." Multimedia Tools and Applications 83, no. 18 (2024): 55565-55585.
Fatima, Anum, Imran Shafi, Hammad Afzal, Khawar Mahmood, Isabel de la Torre Díez, Vivian Lipari, Julien Brito Ballester, and Imran Ashraf. "Deep learning-based multiclass instance segmentation for dental lesion detection." In Healthcare, vol. 11, no. 3, MDPI, 2023, 347.
Zhang, Yang, Yan-Lin Liu, Ke Nie, Jiejie Zhou, Zhongwei Chen, Jeon-Hor Chen, Xiao Wang et al. "Deep learning-based automatic diagnosis of breast cancer on MRI using mask R-CNN for detection followed by ResNet50 for classification." Academic radiology 30 (2023): S161-S171.
Yuan, Yuquan, Shaodong Hou, Xing Wu, Yuteng Wang, Yiceng Sun, Zeyu Yang, Supeng Yin, and Fan Zhang. "Application of deep-learning to the automatic segmentation and classification of lateral lymph nodes on ultrasound images of papillary thyroid carcinoma." Asian Journal of Surgery 47, no. 9 (2024): 3892-3898.
Soltani, Hama, Mohamed Amroune, Issam Bendib, Mohamed-Yassine Haouam, Elhadj Benkhelifa, and Muhammad Moazam Fraz. "Breast lesions segmentation and classification in a two-stage process based on Mask-RCNN and Transfer Learning." Multimedia Tools and Applications 83, no. 12 (2024): 35763-35780.
Kang, Junegyu, Van Nhat Thang Le, Dae-Woo Lee, and Sungchan Kim. "Diagnosing oral and maxillofacial diseases using deep learning." Scientific Reports 14, no. 1 (2024): 2497.
Shetty, Shishir, Auwalu Saleh Mubarak, Leena R David, Mhd Omar Al Jouhari, Wael Talaat, Natheer Al-Rawi, Sausan AlKawas, Sunaina Shetty, and Dilber Uzun Ozsahin. "The application of mask Region-Based convolutional neural networks in the detection of nasal septal deviation using cone beam computed tomography images: Proof-of-Concept study." JMIR Formative Research 8 (2024): e57335.
Freitas, Nuno R., Pedro M. Vieira, Catarina Tinoco, Sara Anacleto, Jorge F. Oliveira, A. Ismael F. Vaz, M. Pilar Laguna, Estêvão Lima, and Carlos S. Lima. "Multiple mask and boundary scoring R-CNN with cGAN data augmentation for bladder tumor segmentation in WLC videos." Artificial Intelligence in Medicine 147 (2024): 102723.
Dataset accessed from https://universe.roboflow.com/srinivas-xujci/oral-cancer-new-1wgcw/dataset/2, 1st November 2024.
Dutta, Abhishek, and Andrew Zisserman. "The VIA annotation software for images, audio and video." In Proceedings of the 27th ACM international conference on multimedia, pp. 2276-2279. 2019.
Dutta, Abhishek, Ankush Gupta, and Andrew Zissermann. VGG image annotator (VIA). 2016.
Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. "Swin transformer: Hierarchical vision transformer using shifted windows." In Proceedings of the IEEE/CVF international conference on computer vision, 10012-10022. 2021.
He, Kaiming, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. "Mask r-cnn." In Proceedings of the IEEE international conference on computer vision, 2961-2969. 2017.
Tian, Yingjie, Duo Su, Stanislao Lauria, and Xiaohui Liu. "Recent advances on loss functions in deep learning for computer vision." Neurocomputing 497 (2022): 129-158.
Damaceno-Araujo, A. L., E. Crespo, M. Cardoso-Moraes, M. Ajudarte-Lopes, P. A. Vargas, L. P. Kowalski, and A. R. Santos-Silva. "UNet-driven image segmentation for improved salivary gland tumor detection." Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology 139, no. 1 (2025): e34-e35.
Priya, J., S. Kanaga Suba Raja, and S. Sudha. "An intellectual caries segmentation and classification using modified optimization-assisted transformer denseUnet++ and ViT-based multiscale residual denseNet with GRU." Signal, Image and Video Processing 18, no. 6 (2024): 5213-5227.

Instance Segmentation of Oral Cancer Images with Fusion of Swin Transformer and Mask RCNN

How to Cite

Download Citation

Keywords

Abstract

References