Conditional Diffusion Model for Aligning Multi-Task Preferences in Sequential Decision-Making
This work proposes a method to learn versatile preference representations that can differentiate trajectories across tasks and with different returns, and uses these representations to guide the conditional generation of diffusion models, ensuring alignment between generated trajectories and human preferences.