Core Concepts
Diffusion models are a powerful and versatile generative AI technology that have achieved remarkable success across various domains, including computer vision, audio, reinforcement learning, and computational biology. This paper provides a comprehensive overview of the theoretical foundations and practical applications of diffusion models, with a focus on understanding their sample generation capabilities under different control and guidance settings.
Abstract
The paper starts by introducing the fundamentals of diffusion models, describing the forward and backward processes that underlie their operation. It then reviews the emerging applications of diffusion models, highlighting their use in vision and audio generation, control and reinforcement learning, and life-science applications, with a particular emphasis on the role of conditional diffusion models in enabling guided and controlled sample generation.
The paper then delves into the theoretical progress on unconditional diffusion models, discussing methods for learning the score function, which is the key to implementing diffusion models. It examines the score approximation and estimation guarantees, as well as the sample complexity of score estimation, especially in the context of high-dimensional and structured data. The paper also covers the theoretical insights on sampling and distribution estimation using diffusion models.
Next, the paper focuses on conditional diffusion models, exploring the learning of conditional score functions and their connection to the unconditional score. It also provides theoretical insights on the impact of guidance in conditional diffusion models.
The paper then reviews the use of diffusion models for data-driven black-box optimization, where the goal is to generate high-quality solutions to an optimization problem by reformulating it as a conditional sampling problem.
Finally, the paper discusses future directions and connections of diffusion models to broader research areas, such as stochastic control, adversarial robustness, and discrete diffusion models.
Stats
"The ground truth score ∇log pt(x) assumes the following orthogonal decomposition:
∇log pt(x) = A∇log pld
t (A⊤x) + 1
1−e−t (I −AA⊤)x"
"As t approaches 0, the magnitude of the term (I −AA⊤)x grows to infinity as long as x ̸= 0."
Quotes
"The reason behind this is that (I −AA⊤)x enforces the orthogonal component to vanish so that the low-dimensional subspace structure is reproduced in generated samples."
"Such a blowup issue appears in all geometric data [133]. As a consequence, an early stopping time t0 > 0 is introduced and the practical score estimation loss is written as..."