Concetti Chiave
DE-ViT establishes new state-of-the-art results in few-shot object detection benchmarks.
Sintesi
The paper introduces DE-ViT, a few-shot object detector that eliminates the need for finetuning. It proposes a region-propagation-based localization architecture, a spatial integral layer for mask-to-box transformation, and a feature subspace projection to reduce overfitting on base classes. DE-ViT outperforms existing methods on COCO, Pascal VOC, and LVIS datasets, achieving significant improvements in accuracy.
Introduction
Few-shot object detection is crucial in computer vision.
Recent methods rely on finetuning, limiting practicality.
Method
DE-ViT introduces a novel region-propagation mechanism.
Spatial integral layer transforms masks into bounding boxes.
Feature subspace projection reduces overfitting.
Experiments
DE-ViT surpasses existing methods on COCO, Pascal VOC, and LVIS.
Ablation studies show the effectiveness of proposed techniques.
Discussion and Conclusion
DE-ViT's techniques can be extended beyond few-shot object detection.
Feature subspace projection introduces inference overhead.
The work aims to benefit downstream tasks and inspire further research.
Statistiche
DE-ViT는 COCO에서 10-shot 및 30-shot에서 SoTA를 15 mAP, 7.2 mAP로 초과하고, LVIS에서 20 box APr로 SoTA를 능가합니다.
Citazioni
"Our method DE-ViT establishes a new state-of-the-art on the few-shot object detection benchmarks."