Core Concepts
Utilizing human movement and action descriptions to bridge the gap between egocentric and exocentric views for video generation.
Abstract
The article introduces an Intention-Driven Ego-to-Exo Video Generation framework that leverages human movement and action descriptions to guide video generation. It addresses challenges in consistency between egocentric and exocentric views, proposing a novel approach to generate exocentric videos from egocentric ones. The framework involves modules for feature perception, trajectory transformation, and action description mapping. Extensive experiments demonstrate the effectiveness of the proposed method in generating high-quality exocentric videos.
Stats
Notable progress achieved in video generation due to diffusion model techniques.
Proposed IDE outperforms state-of-the-art models in subjective and objective assessments.