Grunnleggende konsepter
Introducing a training-free method for zero-shot composed image retrieval with local concept reranking to enhance performance.
Sammendrag
The content discusses the challenges of composed image retrieval and introduces a novel training-free approach. It covers the methodology, experiments on various datasets, comparisons with state-of-the-art methods, and ablation studies. The proposed method achieves significant improvements in performance across different benchmarks.
Introduction
- Composed image retrieval aims to retrieve target images through composed queries.
- Challenges arise from ambiguous requirements and modality gaps between images and text.
Training-Free Approach
- Introduces a training-free method for zero-shot composed image retrieval.
- Utilizes global retrieval baseline and local concept reranking for improved performance.
Experiments and Results
- Conducted experiments on CIRR, FashionIQ, CIRCO, and COCO datasets.
- Achieved comparable performances to state-of-the-art methods with significant improvements in some cases.
Ablation Studies
- Evaluated variants of captioners, large language models, prompts, baselines, and re-rank top K.
- Identified the impact of different components on the model's performance.
Statistikk
To avoid difficult-to-obtain labeled triplet training data, zero-shot composed image retrieval (ZS-CIR) has been introduced.
Extensive experiments show that the proposed method achieves comparable performances to state-of-the-art triplet training based methods.
Our model can generate human-understandable explicit attributes in the training-free framework.
Sitater
"Our method is designed to convert the composed query into explicit human-understandable text."
"Extensive experiments on four ZS-CIR benchmarks show that our method achieves comparable performances."