Core Concepts
PEM introduces an efficient transformer-based architecture for image segmentation tasks, showcasing outstanding performance and efficiency.
Abstract
PEM proposes a novel prototype-based cross-attention mechanism to improve efficiency in multiple segmentation tasks. It introduces an efficient multi-scale feature pyramid network, combining deformable convolutions and context-based self-modulation. The architecture outperforms task-specific models on Cityscapes and ADE20K datasets. PEM achieves remarkable performance while being faster than competing architectures.
Stats
PEM demonstrates outstanding performance on Cityscapes and ADE20K datasets.
The model achieves a PQ of 61.1 with 13.8 FPS on Cityscapes.
On ADE20K, PEM attains a PQ of 38.5 at 35.7 FPS.
Quotes
"PEM demonstrates outstanding performance, showcasing similar or superior results compared to the computationally expensive baselines."
"PEM represents a significant step forward in the ongoing pursuit of efficient image segmentation methodologies."
"The architecture outperforms task-specific models on challenging benchmarks."