核心概念
The author proposes the XPSR framework to address challenges in restoring semantic details in Image Super-Resolution by leveraging cross-modal priors from Multimodal Large Language Models (MLLMs) and introducing Semantic-Fusion Attention and Degradation-Free Constraint.
摘要
The XPSR framework aims to enhance Image Super-Resolution by utilizing cutting-edge MLLMs to extract accurate semantic priors, integrating Semantic-Fusion Attention for effective fusion of cross-modal priors, and applying a Degradation-Free Constraint to retain semantic information. The proposed method shows promising results in generating high-fidelity and realistic images across various datasets.
Key Points:
- Introduction of XPSR framework for Image Super-Resolution.
- Utilization of MLLMs for extracting precise semantic priors.
- Implementation of Semantic-Fusion Attention for optimal fusion of priors.
- Application of Degradation-Free Constraint to preserve semantic information.
- Demonstrated capability in generating high-quality images across synthetic and real-world datasets.
统计
Quantitative results show that XPSR achieves:
SSIM: 0.6870
LPIPS: 0.3517
FID: 141.95
引用
"Noisy latent zt hr, high-level prompt ch, and low-level prompt cl are used to predict the added noise."
"XPSR demonstrates a strong capability in generating high-fidelity and high-realism images."
"The integration of dual-level priors facilitates balancing for image restoration."