Core Concepts
PSALM extends the capabilities of Large Multi-Modal Models to address image segmentation tasks, demonstrating superior performance and task generalization.
Abstract
Introduction:
PSALM enhances Large Multi-Modal Models (LMM) for image segmentation.
Overcomes LMM's limitation of text output for pixel-level understanding.
Methodology:
PSALM incorporates a mask decoder and a flexible input schema for various segmentation tasks.
Input schema includes images, task instructions, conditional prompts, and mask tokens.
Results:
Achieves superior results on benchmarks like COCO Panoptic Segmentation and RefCOCO.
Demonstrates zero-shot capabilities on unseen tasks like open-vocabulary segmentation and video object segmentation.
Experiments:
Joint training across multiple datasets improves performance significantly.
Outperforms other LLM-based methods on referring segmentation tasks.
Stats
PSALMは、COCO Panoptic Segmentationなどのベンチマークで優れた結果を達成しました。