Główne pojęcia
Bongard-OpenWorld introduces a challenging benchmark for few-shot reasoning in machine vision, emphasizing real-world visual concepts.
Streszczenie
Bongard-OpenWorld presents a new benchmark for evaluating few-shot reasoning in machine vision. It focuses on real-world visual concepts and challenges current algorithms. The benchmark is based on the classical Bongard Problems but adds open-world free-form concepts and real-world images. The goal is to identify visual concepts exclusively depicted by positive images and make binary predictions on query images. Various approaches, including Large Language Models (LLMs) and Vision-Language Models (VLMs), have been tested, but none have closed the human-machine gap. The dataset includes diverse visual concepts extracted from Conceptual Captions and crowd-sourced challenging concepts. Each problem consists of positive and negative sets with distractors to increase difficulty. The statistics show a wide range of concept lengths and a long-tailed distribution of words. Several models have been evaluated, with SNAIL showing promising results but still falling short of human performance.
Statystyki
Bongard-OpenWorld achieves 64% accuracy.
Human participants easily reach 91% accuracy.
Cytaty
"We hope Bongard-OpenWorld can help us better understand the limitations of current visual intelligence."
"Bongard-OpenWorld imposes a significant challenge to current few-shot reasoning algorithms."