Fine-Grained Image Retrieval with Sketch and Text Duet
Core Concepts
Combining sketches and text for precise image retrieval.
Abstract
The content explores the synergy between sketches and text in fine-grained image retrieval. It questions the reliance on sketches alone and introduces a compositionality framework using pre-trained CLIP models. The system extends to composite image retrieval, domain attribute transfer, and fine-grained generation.
Introduction:
Sketches vs. Text in Image Retrieval.
Related Works:
Evolution of Sketch-Based Image Retrieval (SBIR).
Revisiting CLIP:
Description of CLIP model components.
Sketch-Based Composed Image Retrieval:
Motivation behind combining structural cues from sketch with textual descriptions.
Experiments:
Evaluation on various datasets for object-level and scene-level composed retrieval.
Conclusion and Future Works:
Summary of findings and potential future directions.
You'll Never Walk Alone
Stats
Two primary input modalities prevail in image retrieval: sketch and text.
Sketches have been established as the sole preferred modality for fine-grained image retrieval due to their ability to capture intricate visual details.
Our method achieves significant improvements in fine-grained composed retrieval compared to baselines and SOTAs across various datasets.
Quotes
"Our method outperforms baselines significantly on all datasets."
"Our system extends its utility to diverse domains such as sketch+text-based fine-grained image generation."
"Our method surpasses other baseline methods with an average Acc.@5 gain of 10.9 on FS-COCO."