A Hybrid ConvNeXtV2–MaxViT Framework with CNN-based Feature Refinement for Skin Lesion Classification
view PDF
view PDF

How to Cite

P.R., Bipin, Anoop V., Ramu R., Santhi K., Upendra Kumar, and Sai Kiran Oruganti. 2026. “A Hybrid ConvNeXtV2–MaxViT Framework With CNN-Based Feature Refinement for Skin Lesion Classification”. Journal of Innovative Image Processing 8 (3): 747-64. https://doi.org/10.36548/jiip.2026.3.001.

Keywords

Skin Disease
Convolutional Neural Network
Vision Transformer
ConvNeXtV2
MaxViT

Abstract

Accurate and prompt detection of skin disorders, particularly malignant skin cancers like melanoma, is considered crucial for effective treatment and improved clinical outcomes. It is a difficult task for even experienced dermatologists to correctly distinguish between similar skin lesions. Deep neural architectures, like Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), can be used for the automated classification of abnormalities in dermoscopic images. ViTs require a large amount of data for optimal generalization, while CNNs are less effective at identifying global patterns. A hybrid model that overcomes the limitations of CNNs and ViTs is proposed in this work. A CNN-based feature refinement module is included in the proposed system to improve lesion-focused features while suppressing irrelevant background information. A dual-path classification algorithm utilizing ConvNeXtV2 for efficient local feature identification and MaxViT to model broader contextual relationships is then employed. The proposed architecture was evaluated on the HAM10000 dataset and validated on the ISIC dataset.  The proposed model outperforms single CNN, ViT, and classical CNN-ViT combination models, based on the experimental results. The architecture discussed here achieves an accuracy of 96.8%, an AUC of 0.978, and a balanced F1-score of 0.965 on the HAM10000 dataset, while demonstrating competitive performance when validated on the ISIC dataset. The effect of CNN-based feature refinement has also been studied. These results demonstrate the effectiveness of combining CNN-based feature refinement with multi-scale feature identification to develop robust and accurate systems for skin disease classification.

References

  1. N. C. F. Codella et al., "Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC)," 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 2018, 168-172.
  2. Esteva, Andre, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Thrun. "Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks." nature 542, no. 7639 (2017): 115-118.
  3. Kawahara, Jeremy, Aicha BenTaieb, and Ghassan Hamarneh. "Deep Features to Classify Skin Lesions." In 2016 IEEE 13th international symposium on biomedical imaging (ISBI), IEEE, 2016, 1397-1400.
  4. Nasr-Esfahani, Ebrahim, Shadrokh Samavi, Nader Karimi, S. Mohamad R. Soroushmehr, Mohammad H. Jafari, Kevin Ward, and Kayvan Najarian. "Melanoma Detection by Analysis of Clinical Images Using Convolutional Neural Network." In 2016 38th annual international conference of the IEEE engineering in medicine and biology society (EMBC), IEEE, 2016, 1373-1376.
  5. Al-Masni, Mohammed A., Mugahed A. Al-Antari, Mun-Taek Choi, Seung-Moo Han, and Tae-Seong Kim. "Skin Lesion Segmentation in Dermoscopy Images via Deep Full Resolution Convolutional Networks." Computer methods and programs in biomedicine 162 (2018): 221-231.
  6. Abdelhafeez, Ahmed, Hoda K. Mohamed, Ali Maher, and Nariman A. Khalil. "A Novel Approach Toward Skin Cancer Classification Through Fused Deep Features and Neutrosophic Environment." Frontiers in Public Health 11 (2023): 1123581.
  7. Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani et al. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." arXiv preprint arXiv:2010.11929 (2020).
  8. Himel, Galib Muhammad Shahriar, Md Masudul Islam, Kh Abdullah Al-Aff, Shams Ibne Karim, and Md Kabir Uddin Sikder. "Skin Cancer Segmentation and Classification Using Vision Transformer for Automatic Analysis in Dermatoscopy‐Based Noninvasive Digital System." International journal of biomedical imaging 2024, no. 1 (2024): 3022192.
  9. Pacal, Ishak, Burhanettin Ozdemir, Javanshir Zeynalov, Huseyn Gasimov, and Nurettin Pacal. "A Novel CNN-ViT-Based Deep Learning Model for Early Skin Cancer Diagnosis." Biomedical Signal Processing and Control 104 (2025): 107627.
  10. Khan, Somaiya, Athar Shahzad Fazal, Amna Khan, and Ali Khan. "An Automated Skin Lesions Classification Using Hybrid CNN and Transformer Based Deep Learning Model." In Proceedings of the 2023 8th international conference on biomedical imaging, signal processing, 2023, 26-31.
  11. Gessert, Nils, Thilo Sentker, Frederic Madesta, Rüdiger Schmitz, Helge Kniep, Ivo Baltruschat, Rene Werner, and Alexander Schlaefer. "Skin Lesion Classification Using CNNs with Patch-Based Attention and Diagnosis-Guided Loss Weighting." IEEE Transactions on Biomedical Engineering 67, no. 2 (2019): 495-503.
  12. Liu, Zhuang, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. "A Convnet for the 2020s." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, 11976-11986.
  13. Woo, Sanghyun, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. "Convnext v2: Co-Designing and Scaling Convnets with Masked Autoencoders." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, 16133-16142.
  14. Ozdemir, Burhanettin, and Ishak Pacal. "An Innovative Deep Learning Framework for Skin Cancer Detection Employing ConvNeXtV2 and Focal Self-Attention Mechanisms." Results in Engineering 25 (2025): 103692.
  15. Tu, Zhengzhong, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. "Maxvit: Multi-Axis Vision Transformer." In European conference on computer vision, Cham: Springer Nature Switzerland, 2022, 459-479.
  16. Goyal, Manu, Amanda Oakley, Priyanka Bansal, Darren Dancey, and Moi Hoon Yap. "Skin Lesion Segmentation in Dermoscopic Images with Ensemble Deep Learning Methods." Ieee Access 8 (2019): 4171-4181.
  17. Tschandl, Philipp, Cliff Rosendahl, and Harald Kittler. "The HAM10000 Dataset, a Large Collection of Multi-Source Dermatoscopic Images of Common Pigmented Skin Lesions." Scientific data 5, no. 1 (2018): 180161.
  18. Brinker, Titus Josef, Achim Hekler, Jochen Sven Utikal, Niels Grabe, Dirk Schadendorf, Joachim Klode, Carola Berking, Theresa Steeb, Alexander H. Enk, and Christof Von Kalle. "Skin Cancer Classification Using Convolutional Neural Networks: Systematic Review." Journal of medical Internet research 20, no. 10 (2018): e11936.
  19. Han, Seung Seog, Ik Jun Moon, Woohyung Lim, In Suck Suh, Sam Yong Lee, Jung-Im Na, Seong Hwan Kim, and Sung Eun Chang. "Keratinocytic Skin Cancer Detection on the Face Using Region-Based Convolutional Neural Network." JAMA dermatology 156, no. 1 (2020): 29-37.