Core Concepts
The core message of this paper is that by scaling up the generative model and training data, and incorporating multimodal textual guidance, the authors have developed a powerful image restoration method called SUPIR that can achieve exceptional photo-realistic results, especially on complex real-world scenarios.
Abstract
The paper introduces SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that leverages large-scale generative priors and multimodal techniques to achieve remarkable restoration effects. Key highlights:
Generative Prior: The authors use the pre-trained StableDiffusion-XL (SDXL) model as the generative prior, which contains 2.6 billion parameters. To effectively deploy SDXL for image restoration, they design a novel adaptor architecture with a trimmed ControlNet and a ZeroSFT connector.
Large-Scale Training Data: The authors collect a dataset of over 20 million high-quality, high-resolution images with detailed descriptive text annotations to support the scaling up of the model.
Multimodal Guidance: The authors incorporate a 13-billion-parameter multimodal language model to provide image content prompts, greatly improving the accuracy and intelligence of the restoration process. This allows for flexible control over the restoration through textual prompts.
Negative-Quality Samples and Prompts: To enhance the model's ability to understand and avoid negative-quality attributes, the authors counter-intuitively add low-quality images generated by SDXL to the training data.
Restoration-Guided Sampling: To address the issue of reduced fidelity due to the powerful generative prior, the authors introduce a restoration-guided sampling method that selectively guides the prediction results to be close to the low-quality input image.
The proposed SUPIR model demonstrates exceptional performance in a variety of image restoration tasks, achieving the best visual quality, especially in complex and challenging real-world scenarios.
Stats
The authors collected a dataset of over 20 million high-quality, high-resolution (1024x1024) images.
They also included an additional 70K unaligned high-resolution facial images from the FFHQ-raw dataset.
The authors generated 100K low-quality images using SDXL to represent negative-quality samples for training.
Quotes
"SUPIR marks a significant advance in intelligent and realistic image restoration."
"Continuously improving the capabilities of the generative prior is key to achieving better IR results, with model scaling being a crucial and effective approach."
"We not only facilitate the scaling up of SUPIR but also push the frontiers of advanced IR."