Grunnleggende konsepter
A novel inference-time algorithm, SVDD, that optimizes downstream reward functions in pre-trained diffusion models without the need for fine-tuning or constructing differentiable proxy models.
Sammendrag
The paper proposes a new method, SVDD (Soft Value-based Decoding in Diffusion models), to optimize downstream reward functions in pre-trained diffusion models. The key ideas are:
- Introducing soft value functions that predict how intermediate noisy states lead to high rewards in the future of the diffusion denoising process.
- Presenting a new inference-time technique, SVDD, which obtains multiple noisy states from the policy (i.e., denoising map) of pre-trained diffusion models and selects the sample with the highest value function at each time step.
The SVDD approach has several advantages:
- It does not require fine-tuning the generative model, unlike classifier-free guidance or RL-based fine-tuning methods.
- It does not need to construct differentiable proxy models, unlike classifier guidance methods, allowing the use of non-differentiable reward feedback common in scientific domains.
- It can be directly applied to recent discrete diffusion models without any modification.
The authors demonstrate the effectiveness of their methods across various domains, including image generation, molecule generation, and DNA/RNA generation.
Statistikk
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
Existing methods for optimizing downstream reward functions often require "differentiable" proxy models or involve computationally expensive fine-tuning of diffusion models.
The proposed SVDD-MC and SVDD-PM algorithms outperform baseline methods like Best-of-N and DPS across multiple domains.
Sitater
"Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely -generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces."
"Notably, our approach avoids fine-tuning generative models and eliminates the need to construct differentiable models. This enables us to (1) directly utilize non-differentiable features/reward feedback, commonly used in many scientific domains, and (2) apply our method to recent discrete diffusion models in a principled way."