Volume - 7 | Issue - 4 | december 2025
Published
22 October, 2025
Lack of good pixel-level expert annotations has traditionally impaired the development of robust object detection models for medical diagnosis. This article proposes a weakly supervised approach that generates accurate bounding box labels with minimal user interaction through image-level classification. The weakly supervised nature of the proposed approach tackles the annotation bottleneck by converting cheaper and more available class-level labels into spatial annotations of high value. The proposed two-stage method first trains a classifier on diagnostic labels and then applies Class Activation Mapping (Grad-CAM) to generate high-quality pseudo-labels. These machine-generated annotations are then used to train a state-of-the-art YOLOv8s detector for the final diagnosis task. The system performed cataract detection from fundus images with a mean Average Precision (mAP@50) of 99% and a stricter mAP@50-95 of 96.9%. An important recall rate of 97.1% was achieved in the cataract class, making the possibility of a missed diagnosis almost negligible. These results hold competitive status when compared with fully supervised methods that require extensive manual annotation, reaffirming our method as data-efficient, highly scalable, and a robust collaborator in fast-tracking the development of medical AI tools.
KeywordsWeakly Supervised Learning Medical Image Analysis Cataract Detection Grad-CAM YOLOv8