Leveraging Synthetic Captions and Hyperbolic Learning for Robust Open-World Object Detection
The core message of this paper is to leverage synthetic captions generated by pre-trained vision-language models and a novel hyperbolic vision-language learning approach to boost the performance of open-world object detection, which aims to detect and localize any object using either class labels or free-form texts.