InstructDET introduces a data-centric method for referring object detection (ROD) that localizes target objects based on user instructions. The method leverages foundation models to produce human-like instructions that encompass common user intentions related to object detection. The dataset, InDET, is developed from existing REC datasets and object detection datasets, allowing for the incorporation of images with object bounding boxes. By using InDET, a conventional ROD model surpasses existing methods on standard REC datasets and the InDET test set. InstructDET directs a promising field where ROD can be diversified to execute common object detection instructions effectively.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Ronghao Dang... klokken arxiv.org 03-12-2024
https://arxiv.org/pdf/2310.05136.pdfDypere Spørsmål