This paper proposes Weakly-Supervised Deep Hyperspherical Quantization (WSDHQ), the first work to address the problem of learning deep quantization from weakly-tagged images without using ground-truth labels.
Removing class overlap between the training set (Google Landmarks v2 clean) and the evaluation sets (Revisited Oxford and Paris) leads to a dramatic drop in performance across state-of-the-art image retrieval methods, highlighting the critical importance of avoiding such overlap. We introduce CiDeR, a single-stage, end-to-end pipeline that detects objects of interest and extracts a global image representation without requiring location supervision, outperforming previous state-of-the-art methods on both the original and the new, revisited dataset.
Proposing a novel caption-matching method for cross-domain image retrieval using multimodal language-vision architectures.
Combining sketches and text for precise image retrieval.