Enhancing Semantic Grounding in Vision-Language Models through Iterative Feedback
VLMs can improve their semantic grounding performance by receiving and generating feedback, without requiring in-domain data, fine-tuning, or modifications to the network architectures.