Image-Text Co-Decomposition for Precise Text-Supervised Semantic Segmentation
The proposed Image-Text Co-Decomposition (CoDe) framework jointly decomposes image-text pairs into corresponding regions and word segments, enabling direct region-word alignment and alleviating the discrepancy between training and testing for text-supervised semantic segmentation.