Proposing an adaptive high-quality talking-head video generation method without additional pre-trained modules.