แนวคิดหลัก
The core message of this paper is that image inpainting can be effectively achieved by jointly modeling structure-constrained texture synthesis and texture-guided structure reconstruction in a two-stream network architecture, which allows the two subtasks to better leverage each other for more plausible generation.
บทคัดย่อ
The paper proposes a novel two-stream network for image inpainting, which casts the task into two collaborative subtasks: structure-constrained texture synthesis and texture-guided structure reconstruction. The two parallel-coupled streams are individually modeled and combined to complement each other. A Bi-directional Gated Feature Fusion (Bi-GFF) module is introduced to integrate the rebuilt structure and texture feature maps to enhance their consistency, along with a Contextual Feature Aggregation (CFA) module to highlight the clues from distant spatial locations to render finer details.
The key highlights are:
- The two-stream architecture models structure and texture generation in a coupled manner, allowing them to better leverage each other for more plausible results.
- The Bi-GFF module exchanges and combines the structure and texture information to improve their consistency.
- The CFA module refines the generated contents by region affinity learning and multi-scale feature aggregation.
- Extensive experiments on multiple public benchmarks demonstrate the superiority of the proposed method over state-of-the-art approaches both qualitatively and quantitatively.
สถิติ
Deep generative approaches have recently made considerable progress in image inpainting by introducing structure priors.
Current solutions are incompetent in handling the cases with large corruptions, and they generally suffer from distorted results.
คำพูด
"Due to the lack of proper interaction with im-
age texture during structure reconstruction, however, cur-
rent solutions are incompetent in handling the cases with
large corruptions, and they generally suffer from distorted
results."
"To deal with this problem, a number of multi-stage meth-
ods are proposed to explicitly incorporate structure model-
ing, which hallucinate structures of missing regions in the
first stage and use them to guide pixel generation in the sec-
ond stage."