CRG improves vision-language models by guiding them to focus on specific regions in images without additional training.