The content introduces FineR, a novel approach that democratizes fine-grained visual recognition by utilizing LLMs to reason about category names without expert annotations. The method outperforms existing approaches on multiple benchmarks and demonstrates effectiveness in challenging tasks.
Identifying subordinate-level categories in images is crucial but challenging due to subtle differences among species. Existing methods rely heavily on expert annotations, creating bottlenecks in new domains. FineR proposes a training-free solution using LLMs for reasoning about fine-grained categories.
The proposed system translates visual attributes into text for LLM reasoning, achieving impressive results across various datasets. A human study shows FineR narrows the gap between laypersons and experts in FGVR. Additionally, the system excels on a novel Pokemon dataset, outperforming other methods significantly.
toiselle kielelle
lähdeaineistosta
arxiv.org
Tärkeimmät oivallukset
by Mingxuan Liu... klo arxiv.org 03-12-2024
https://arxiv.org/pdf/2401.13837.pdfSyvällisempiä Kysymyksiä