핵심 개념
Proposing a new paradigm, Listening and Imagining, for generating diverse and coherent talking faces solely from audio.
초록
Introduces the concept of Listening and Imagining for face generation from audio.
Two critical challenges addressed: decoupling identity, content, and emotion from audio, and maintaining diversity and consistency in video generation.
Progressive Audio Disentanglement simplifies the decoupling process.
Controllable Coherent Frame generation ensures diverse and coherent face animation.
Extensive experiments demonstrate the effectiveness of the proposed method.
통계
"We propose a new paradigm, Listening and Imagining, for generating diverse and coherent talking faces based on a single audio."
"Extensive experiments demonstrate the flexibility and effectiveness of our method in handling this paradigm."