EVCAP: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
A highly effective retrieval-augmented image captioning method that prompts large language models with object names retrieved from an external visual-name memory to enable open-world comprehension.