The article introduces a Language-Driven Visual Consensus (LDVC) approach to improve zero-shot semantic segmentation by aligning visual features with class embeddings. The method leverages class embeddings as anchors and introduces route attention to enhance semantic consistency within objects. By incorporating a vision-language prompting strategy, the LDVC approach significantly boosts the generalization capacity of segmentation models for unseen classes. Experimental results demonstrate the effectiveness of the LDVC approach, showcasing improvements in mIoU gains on benchmark datasets compared to state-of-the-art methods.
เป็นภาษาอื่น
จากเนื้อหาต้นฉบับ
arxiv.org
ข้อมูลเชิงลึกที่สำคัญจาก
by Zicheng Zhan... ที่ arxiv.org 03-14-2024
https://arxiv.org/pdf/2403.08426.pdfสอบถามเพิ่มเติม