InstructDET introduces a data-centric method for referring object detection (ROD) that localizes target objects based on user instructions. The method leverages foundation models to produce human-like instructions that encompass common user intentions related to object detection. The dataset, InDET, is developed from existing REC datasets and object detection datasets, allowing for the incorporation of images with object bounding boxes. By using InDET, a conventional ROD model surpasses existing methods on standard REC datasets and the InDET test set. InstructDET directs a promising field where ROD can be diversified to execute common object detection instructions effectively.
Til et andet sprog
fra kildeindhold
arxiv.org
Vigtigste indsigter udtrukket fra
by Ronghao Dang... kl. arxiv.org 03-12-2024
https://arxiv.org/pdf/2310.05136.pdfDybere Forespørgsler