מושגי ליבה
MuseTalk is a novel real-time framework that generates high-quality, lip-synced talking face videos by leveraging latent space inpainting, multi-scale audio-visual feature fusion, and innovative information modulation strategies.
Zhang, Y., Liu, M., Chen, Z., Wu, B., Zeng, Y., Zhan, C., He, Y., Huang, J., & Zhou, W. (2024). MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting. arXiv preprint arXiv:2410.10122v1.
This paper introduces MuseTalk, a novel framework for real-time, high-quality talking face generation, aiming to address the challenges of lip-speech synchronization, high resolution, and identity consistency in few-shot face visual dubbing.