Ctrl123 introduces a closed-loop transcription-based method to enhance consistency in novel view synthesis. Existing diffusion-based methods struggle with pose and appearance alignment, limiting downstream tasks. Ctrl123 enforces alignment between generated views and ground truth, significantly improving performance. The method extends the open-loop framework to a closed-loop one, measuring pose consistency through metrics like AA and IoU. Extensive experiments show significant improvements over current state-of-the-art methods.
Vers une autre langue
à partir du contenu source
arxiv.org
Questions plus approfondies