Open-World Video Instance Segmentation and Captioning: Detecting, Tracking, and Describing Previously Unseen Objects in Videos
OW-VISCap simultaneously detects, segments, tracks, and generates rich object-centric captions for both previously seen and unseen objects in videos, without requiring additional user inputs or prompts.