Leveraging Large Language Models for Answering Queries in Long-form Egocentric Videos
LifelongMemory, a new framework that leverages pre-trained multimodal large language models (MLLMs) to perform reasoning and answer natural language queries over long-form egocentric video inputs.