Dall’Asen, N., Wang, Y., Fini, E., & Ricci, E. (2024). Retrieval-enriched zero-shot image classification in low-resource domains. arXiv preprint arXiv:2411.00988.
This paper addresses the challenge of zero-shot image classification in low-resource domains, where data scarcity hinders traditional training-based methods. The authors propose a novel training-free approach called CORE (Combination of Retrieval Enrichment) to improve classification accuracy by leveraging retrieval-based enrichment techniques.
CORE utilizes a pre-trained Vision-Language Model (VLM) and a large web-crawled text-image database. It enriches both the query image and class prototype representations with textual information retrieved from the database. For the query image, CORE performs image-to-text retrieval using the VLM's image encoder and combines the retrieved captions' embeddings with the original image embedding. For class prototypes, CORE retrieves relevant captions based on the class prompt text and combines their embeddings with the original class prototype embedding. Finally, classification is performed by calculating cosine similarities between the enriched image and class prototype representations.
The authors conclude that CORE offers a promising solution for zero-shot image classification in low-resource domains. Its training-free nature and reliance on readily available web-crawled data make it a practical and effective approach for real-world applications.
This research significantly contributes to the field of computer vision by introducing a novel and effective method for low-resource image classification. CORE's training-free approach and reliance on readily available data make it a valuable tool for researchers and practitioners working with limited data.
The performance of CORE is limited by the representation of a domain in the external database. Future research could explore methods to improve retrieval accuracy and coverage for rare domains. Additionally, investigating the impact of different VLMs and retrieval databases on CORE's performance could further enhance its effectiveness.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Nicola Dall'... at arxiv.org 11-05-2024
https://arxiv.org/pdf/2411.00988.pdfDeeper Inquiries