Neighbour-Aware CLIP: A Training-Free Approach for Open-Vocabulary Semantic Segmentation
A straightforward adaptation of CLIP that enforces localization of patches in the self-attention, significantly improving performance on open-vocabulary semantic segmentation without requiring additional data, auxiliary pre-trained networks, or extensive hyperparameter tuning.