Efficient Long Video Understanding via Large Language Models
LongVLM, a straightforward yet powerful VideoLLM, decomposes long videos into multiple short-term segments, encodes local features for each segment, and integrates global semantics to enable comprehensive understanding of long-term video content.