The paper introduces DMP (Diffusion Models as Priors), a method that leverages pre-trained text-to-image (T2I) diffusion models as a prior for generalizable dense prediction tasks. The key challenges addressed are the determinism-stochasticity misalignment between diffusion models and deterministic prediction tasks, as well as the need to strike a balance between learning target tasks and retaining the inherent generalizability of pre-trained T2I models.
To resolve the determinism-stochasticity issue, the authors reformulate the diffusion process as a chain of interpolations between input RGB images and their corresponding output signals, where the importance of input images gradually increases over the diffusion process. This allows the reverse diffusion process to become a series of deterministic transformations that progressively synthesize the desired output signals from input images.
To retain the generalizability of the pre-trained T2I model while learning target tasks, the authors use low-rank adaptation to fine-tune the pre-trained model with the aforementioned deterministic diffusion process for each dense prediction task.
The proposed DMP approach is evaluated on five dense prediction tasks: 3D property estimation (depth, normal), semantic segmentation, and intrinsic image decomposition (albedo, shading). The results show that with only a small amount of limited-domain training data (10K bedroom images), DMP can provide faithful estimations on in-domain and unseen images, outperforming existing state-of-the-art algorithms, especially on images where the off-the-shelf methods struggle.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы