InstructDET introduces a data-centric method for referring object detection (ROD) that localizes target objects based on user instructions. The method leverages foundation models to produce human-like instructions that encompass common user intentions related to object detection. The dataset, InDET, is developed from existing REC datasets and object detection datasets, allowing for the incorporation of images with object bounding boxes. By using InDET, a conventional ROD model surpasses existing methods on standard REC datasets and the InDET test set. InstructDET directs a promising field where ROD can be diversified to execute common object detection instructions effectively.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Ronghao Dang... lúc arxiv.org 03-12-2024
https://arxiv.org/pdf/2310.05136.pdfYêu cầu sâu hơn