InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
InternVideo2 introduces a new video foundation model that excels in action recognition, video-text tasks, and video-centric dialogue through a progressive training paradigm.