The author proposes a novel framework with four modules to generate high-quality videos with consistency by optimizing background and foreground in each frame.