核心概念
ASUKA framework enhances image inpainting by achieving context-stability and visual-consistency through alignment with a frozen SD model.
摘要
The ASUKA framework proposes a balanced solution to address context-instability and visual inconsistency in image inpainting. By utilizing a Masked Auto-Encoder (MAE) as a prior, ASUKA aligns the MAE with the Stable Diffusion (SD) model to improve context stability. Additionally, an inpainting-specialized decoder is used to enhance visual consistency by mitigating color inconsistencies between masked and unmasked regions. The effectiveness of ASUKA is validated on benchmark datasets Places 2 and MISATO, showcasing superior results compared to state-of-the-art methods.
統計資料
Comparison on 10242 image between ASUKA and other inpainting models.
MISATO dataset contains images from Matterport3D, Flickr-Landscape, MegaDepth, COCO 2014.
SD achieves impressive results but suffers from context-instability and visual inconsistency issues.
引述
"ASUKA achieves context-stable and visual-consistent inpainting."
"Recent progress in inpainting relies on generative models but introduces context-instability."
"ASUKA significantly improves context stability compared to existing algorithms."