Abstract
Architectures based on Convolutional Neural Networks (CNNs), like U-Net, have demonstrated notable efficiency in the segmentation of renal medical images. However, because convolution processes are limited and have restricted accessible fields, they frequently have trouble capturing long-range dependencies. Recent developments have improved global context modeling by incorporating transformer modules into U-Net variations to address this issue. However, during the global fusion process, these transformers based methods run the risk of losing important local spatial information. This research introduces Multi-Scale MCPA, a unique architecture designed specifically for the segmentation of 2D renal medical images. An encoder, decoder and cross perceptron module are the three main parts of MCPATo provide rich multi-scale feature interaction, the Cross Perceptron primarily uses several Multi-Scale Cross Perceptron (MCP) modules to capture local dependencies. To efficiently model long-range dependencies, these features are spatially unfolded, concatenated, and processed by a Global Perceptron component. A Progressive Dual-Branch Structure (PDBS) is implemented to enhance segmentation performance, particularly for fine-grained structures. During training, this component guides the network to progressively transfer its attention from coarse structural elements to intricate pixel-level representations. The proposed method is specifically designed for 2D medical image segmentation tasks, given the clinical significance of 2D imaging and the high computing demands of 3D models. Experimentation of the proposed approach on multiple publicly accessible datasets from different imaging tasks and modalities, such as OCTA (ROSE), fundus images (DRIVE, CHASE_DB1, HRF), MRI (ACDC), and CT (Synapse), demonstrates that the proposed method reliably outperforms state-of-the-art segmentation methods, accomplishing enhancements of +2.1% Dice score on Synapse CT, +2.6% on ACDC MRI, and up to +3.4% on retinal fundus datasets. The effectiveness and generalizability of MCPA are established by experimental results, which show that it routinely outperforms existing techniques in segmentation accuracy.
References
Shamshad, Fahad, Salman Khan, Syed Waqas Zamir, Muhammad Haris Khan, Munawar Hayat, Fahad Shahbaz Khan, and Huazhu Fu. "Transformers in medical imaging: A survey." Medical image analysis 88 (2023): 102802.
Gibson, Eli, Francesco Giganti, Yipeng Hu, Ester Bonmati, Steve Bandula, Kurinchi Gurusamy, Brian Davidson, Stephen P. Pereira, Matthew J. Clarkson, and Dean C. Barratt. "Automatic multi-organ segmentation on abdominal CT with dense V-networks." IEEE transactions on medical imaging 37, no. 8 (2018): 1822-1834.
Azad, Reza, Amirhossein Kazerouni, Moein Heidari, Ehsan Khodapanah Aghdam, Amirali Molaei, Yiwei Jia, Abin Jose, Rijo Roy, and Dorit Merhof. "Advances in medical image analysis with vision transformers: a comprehensive review." Medical Image Analysis 91 (2024): 103000.
Chen, Jieneng, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L. Yuille, and Yuyin Zhou. "Transunet: Transformers make strong encoders for medical image segmentation." arXiv preprint arXiv:2102.04306 (2021).
Nanni, Loris, Carlo Fantozzi, Andrea Loreggia, and Alessandra Lumini. "Ensembles of convolutional neural networks and transformers for polyp segmentation." Sensors 23, no. 10 (2023): 4688.
Ghazouani, Fethi, Pierre Vera, and Su Ruan. "Efficient brain tumor segmentation using Swin transformer and enhanced local self-attention." International Journal of Computer Assisted Radiology and Surgery 19, no. 2 (2024): 273-281.
Ali, Hazrat, Farida Mohsen, and Zubair Shah. "Improving diagnosis and prognosis of lung cancer using vision transformers: a scoping review." BMC Medical Imaging 23, no. 1 (2023): 129.
Zhou, Hong-Yu, Jiansen Guo, Yinghao Zhang, Lequan Yu, Liansheng Wang, and Yizhou Yu. "nnformer: Interleaved transformer for volumetric segmentation." arXiv preprint arXiv:2109.03201 (2021).
Antonelli, Michela, Annika Reinke, Spyridon Bakas, Keyvan Farahani, Annette Kopp-Schneider, Bennett A. Landman, Geert Litjens et al. "The medical segmentation decathlon." Nature communications 13, no. 1 (2022): 4128.
Hatamizadeh, Ali, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R. Roth, and Daguang Xu. "Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images." In International MICCAI brainlesion workshop, Cham: Springer International Publishing, (2021): 272-284.
Gu, Zaiwang, Jun Cheng, Huazhu Fu, Kang Zhou, Huaying Hao, Yitian Zhao, Tianyang Zhang, Shenghua Gao, and Jiang Liu. "Ce-net: Context encoder network for 2d medical image segmentation." IEEE transactions on medical imaging 38, no. 10 (2019): 2281-2292.
Li, Yuanyuan, Ziyu Wang, Li Yin, Zhiqin Zhu, Guanqiu Qi, and Yu Liu. "X-net: a dual encoding–decoding method in medical image segmentation." The Visual Computer 39, no. 6 (2023): 2223-2233.
Chen, Xuming, Shanlin Sun, Narisu Bai, Kun Han, Qianqian Liu, Shengyu Yao, Hao Tang et al. "A deep learning-based auto-segmentation system for organs-at-risk on whole-body computed tomography images for radiation therapy." Radiotherapy and Oncology 160 (2021): 175-184.
Yuan, Feiniu, Zhengxiao Zhang, and Zhijun Fang. "An effective CNN and Transformer complementary network for medical image segmentation." Pattern Recognition 136 (2023): 109228.
Ibtehaz, Nabil, and M. Sohel Rahman. "MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation." Neural networks 121 (2020): 74-87.
Sirinukunwattana, Korsuk, Josien PW Pluim, Hao Chen, Xiaojuan Qi, Pheng-Ann Heng, Yun Bo Guo, Li Yang Wang et al. "Gland segmentation in colon histology images: The glas challenge contest." Medical image analysis 35 (2017): 489-502.
Wang, Wenhai, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. "Pvt v2: Improved baselines with pyramid vision transformer." Computational visual media 8, no. 3 (2022): 415-424.
Kumar, Neeraj, Ruchika Verma, Sanuj Sharma, Surabhi Bhargava, Abhishek Vahadane, and Amit Sethi. "A dataset and a technique for generalized nuclear segmentation for computational pathology." IEEE transactions on medical imaging 36, no. 7 (2017): 1550-1560.
Yun, Boxiang, Yan Wang, Jieneng Chen, Huiyu Wang, Wei Shen, and Qingli Li. "Spectr: Spectral transformer for hyperspectral pathology image segmentation." arXiv preprint arXiv:2103.03604 (2021).
Lin, X., Yan, Z., Yu, L., and Cheng, K.-T. "C2FTrans: Coarse-to-fine transformers for medical image segmentation." arXiv preprint arXiv:2206.14409 (2022).
Li, Di, and Susanto Rahardja. "BSEResU-Net: An attention-based before-activation residual U-Net for retinal vessel segmentation." Computer Methods and Programs in Biomedicine 205 (2021): 106070.
Imran, Azhar, Jianqiang Li, Yan Pei, Ji-Jiang Yang, and Qing Wang. "Comparative analysis of vessel segmentation techniques in retinal images." IEEE Access 7 (2019): 114862-114887.
Aras, Rezty Amalia, Tri Lestari, Hanung Adi Nugroho, and Igi Ardiyanto. "Segmentation of retinal blood vessels for detection of diabetic retinopathy: A review." Communications in Science and Technology 1, no. 1 (2016).
Dong, Fangfang, Dengyang Wu, Chenying Guo, Shuting Zhang, Bailin Yang, and Xiangyang Gong. "CRAUNet: A cascaded residual attention U-Net for retinal vessel segmentation." Computers in Biology and Medicine 147 (2022): 105651.
Ma, Yuhui, Huaying Hao, Jianyang Xie, Huazhu Fu, Jiong Zhang, Jianlong Yang, Zhen Wang, Jiang Liu, Yalin Zheng, and Yitian Zhao. "ROSE: a retinal OCT-angiography vessel segmentation dataset and new model." IEEE transactions on medical imaging 40, no. 3 (2020): 928-939.
Lin, Ailiang, Bingzhi Chen, Jiayu Xu, Zheng Zhang, Guangming Lu, and David Zhang. "Ds-transunet: Dual swin transformer u-net for medical image segmentation." IEEE Transactions on Instrumentation and Measurement 71 (2022): 1-15.
Budai, Attila, Rüdiger Bock, Andreas Maier, Joachim Hornegger, and Georg Michelson. "Robust vessel segmentation in fundus images." International journal of biomedical imaging 2013, no. 1 (2013): 154860.
Zhang, Zhuangzhuang, and Weixiong Zhang. "Pyramid medical transformer for medical image segmentation." arXiv preprint arXiv:2104.14702 (2021).
Wang, Peihao, Wenqing Zheng, Tianlong Chen, and Zhangyang Wang. "Anti-oversmoothing in deep vision transformers via the fourier domain analysis: From theory to practice." arXiv preprint arXiv:2203.05962 (2022).
Azad, Reza, Leon Niggemeier, Michael Hüttemann, Amirhossein Kazerouni, Ehsan Khodapanah Aghdam, Yury Velichko, Ulas Bagci, and Dorit Merhof. "Beyond self-attention: Deformable large kernel attention for medical image segmentation." In Proceedings of the IEEE/CVF winter conference on applications of computer vision, (2024): 1287-1297.
Huang, Huimin, Shiao Xie, Lanfen Lin, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, and Ruofeng Tong. "ScaleFormer: revisiting the transformer-based backbones from a scale-wise perspective for medical image segmentation." arXiv preprint arXiv:2207.14552 (2022).
Azad, Reza, René Arimond, Ehsan Khodapanah Aghdam, Amirhossein Kazerouni, and Dorit Merhof. "Dae-former: Dual attention-guided efficient transformer for medical image segmentation." In International workshop on predictive intelligence in medicine, Cham: Springer Nature Switzerland, (2023): 83-95.
