Aligning Diffusion Models with Human Preferences: Techniques, Challenges, and Future Directions
Diffusion models have emerged as a leading paradigm in generative modeling, but their outputs often do not align with human intentions and preferences. Recent studies have investigated aligning diffusion models with human expectations through techniques like reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO).