Concepts de base
This work proposes a method to learn versatile preference representations that can differentiate trajectories across tasks and with different returns, and uses these representations to guide the conditional generation of diffusion models, ensuring alignment between generated trajectories and human preferences.
Résumé
The paper presents a method called Conditional Alignment via Multi-task Preference representations (CAMP) for multi-task preference learning and trajectory generation.
Key highlights:
- CAMP defines multi-task preferences that consider both the return of trajectories within the same task and the task-relevance of trajectories across different tasks.
- CAMP learns a trajectory encoder that extracts preference-relevant representations, aligning them with the multi-task preferences using a triplet loss and a KL divergence loss. It also learns an 'optimal' representation for each task.
- CAMP augments a conditional diffusion model with a mutual information regularization term to ensure the alignment between the generated trajectories and the learned preference representations.
- Experiments on Meta-World and D4RL benchmarks demonstrate CAMP's superior performance in both multi-task and single-task scenarios, as well as its ability to generalize to unseen tasks.
Stats
The paper does not contain any explicit numerical data or statistics. The key results are presented in the form of performance comparisons on benchmark datasets.
Citations
There are no direct quotes from the content that are particularly striking or support the key arguments.