The author presents a novel approach, Listening and Imagining, to generate diverse and coherent talking faces solely from audio inputs. By disentangling identity, content, and emotion and introducing a Controllable Coherent Frame generation method, the author achieves high-quality face animation.
The author presents Style2Talker, a method for generating high-resolution talking head videos with emotion and art styles by incorporating text-controlled emotion style and picture-controlled art style. The approach combines innovative techniques to achieve realistic and expressive results.
G4G is a generic framework that generates high fidelity talking face videos with synchronized lip movements regardless of audio tones or volumes.
Proposing a new paradigm, Listening and Imagining, for generating diverse and coherent talking faces based on a single audio input.
G4G is a generic framework for high fidelity talking face generation with fine-grained intra-modal alignment, achieving synchronized lip movements with given audio tones.
SuperFace introduces a teacher-student framework for high-quality, robust, low-cost, and editable talking face generation.
MimicTalk presents a novel approach to personalized talking face generation that leverages the efficiency and generalizability of a pre-trained person-agnostic 3D model, achieving high-quality and expressive results with significantly faster adaptation compared to traditional person-dependent methods.
MuseTalk is a novel real-time framework that generates high-quality, lip-synced talking face videos by leveraging latent space inpainting, multi-scale audio-visual feature fusion, and innovative information modulation strategies.