핵심 개념
Transformer-based detector SimPLR simplifies object detection with scale-aware attention, outperforming multi-scale alternatives.
초록
The article introduces SimPLR, a transformer-based object detector that eliminates the need for handcrafted multi-scale feature maps. It showcases competitive performance in object detection, instance segmentation, and panoptic segmentation tasks on COCO dataset. The use of scale-aware attention allows SimPLR to efficiently capture objects at various sizes without relying on multi-scale features. The study highlights the potential of transformer-based architectures in simplifying neural network designs for dense vision tasks.
Directory:
Abstract
Introduction
Object Detection Architectures (Plain vs Hierarchical)
Scale-Aware Attention Mechanism
Network Architecture and Training Details
Experiments and Results (Object Detection, Instance Segmentation, Panoptic Segmentation)
Additional Results (Panoptic Segmentation Comparison)
Qualitative Results (Object Detection, Instance Segmentation, Panoptic Segmentation)
통계
SimPLR shows competitive performance with 55.7 APb.
ViT backbone with MAE pre-training used.
ViT-H backbone utilized for larger models.
인용구
"SimPLR simplifies object detection by eliminating the need for handcrafted multi-scale feature maps."
"Scale-aware attention enables SimPLR to efficiently capture objects at various sizes."