Core Concepts
Proposing a generalized framework for object counting that combines SAM and CLIP to achieve state-of-the-art performance in both few-shot/zero-shot object counting/detection.
Abstract
The content introduces a framework for object counting that combines SAM and CLIP. It addresses the challenges of efficiency overhead and small crowded objects, proposing a three-step approach: point, segment, and count. The framework achieves state-of-the-art performance in both few-shot/zero-shot object counting/detection. It includes an abstract, introduction, related work, proposed approach, experiments, results, and conclusion.
Stats
Our framework achieves state-of-the-art performance in both few-shot/zero-shot object counting/detection.
PseCo is trained for 50k iterations with a mini-batch size of 32.
ViT-H is used as the default SAM in the framework.
PseCo selects an average of 378/388 candidate points for each image in the FSC-147 test/val sets.
Quotes
"Our framework combines the superior advantages of two foundation models without compromising their zero-shot capability."
"Extensive experimental results on FSC-147, COCO, and LVIS demonstrate that PseCo achieves state-of-the-art performance in both few-shot/zero-shot object counting/detection."