InstructDET introduces a data-centric method for referring object detection (ROD) that localizes target objects based on user instructions. The method leverages foundation models to produce human-like instructions that encompass common user intentions related to object detection. The dataset, InDET, is developed from existing REC datasets and object detection datasets, allowing for the incorporation of images with object bounding boxes. By using InDET, a conventional ROD model surpasses existing methods on standard REC datasets and the InDET test set. InstructDET directs a promising field where ROD can be diversified to execute common object detection instructions effectively.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Ronghao Dang... om arxiv.org 03-12-2024
https://arxiv.org/pdf/2310.05136.pdfDiepere vragen