The article introduces a Language-Driven Visual Consensus (LDVC) approach to improve zero-shot semantic segmentation by aligning visual features with class embeddings. The method leverages class embeddings as anchors and introduces route attention to enhance semantic consistency within objects. By incorporating a vision-language prompting strategy, the LDVC approach significantly boosts the generalization capacity of segmentation models for unseen classes. Experimental results demonstrate the effectiveness of the LDVC approach, showcasing improvements in mIoU gains on benchmark datasets compared to state-of-the-art methods.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Zicheng Zhan... alle arxiv.org 03-14-2024
https://arxiv.org/pdf/2403.08426.pdfDomande più approfondite