LipVoicer: Generating High-Quality and Intelligible Speech from Silent Videos Using Lip Reading and Diffusion Models
LipVoicer is a novel method that generates high-quality and intelligible speech from silent videos by leveraging a lip-reading model to guide a diffusion-based speech generation model.