VideoAgent utilizes a large language model as an agent to iteratively identify and compile crucial information from long-form videos, showcasing superior effectiveness and efficiency in advancing video understanding.
VideoAgent utilizes a large language model as an agent to iteratively identify and compile crucial information in long-form videos, emphasizing interactive reasoning over direct visual processing.