The paper introduces DeS3, a diffusion-based method for removing hard, soft, and self shadows from a single image. Unlike existing methods that rely on binary shadow masks, DeS3 does not require such masks during training or testing.
The key innovations of DeS3 are:
Adaptive Attention: DeS3 employs an adaptive attention mechanism that is progressively refined throughout the diffusion process. This allows the method to effectively handle self-shadows and soft shadows that lack clear boundaries.
ViT Similarity: To preserve the object and scene structures during shadow removal, DeS3 incorporates a ViT similarity loss. This loss utilizes features extracted from a pre-trained Vision Transformer (ViT) model, which are more robust to shadows compared to CNN-based features.
The reverse sampling process in DeS3 starts from a noise map and the input shadow image. The adaptive attention guides the sampling to focus on the shadow regions, while the ViT similarity loss ensures that the output preserves the underlying object structures, even when they are partially occluded by shadows.
Comprehensive experiments on several benchmark datasets demonstrate that DeS3 outperforms state-of-the-art shadow removal methods, particularly in handling self-shadows and soft shadows.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問