This paper explores the use of preference learning to enhance text-to-motion generation models. The authors find that current text-to-motion generation methods rely on limited datasets that require expert labelers and motion capture systems, leading to poor alignment between the generated motions and the input text prompts.
To address this, the authors propose leveraging preference learning, where non-expert labelers simply compare two generated motions and provide feedback on their preferences. This approach is more cost-effective and scalable than gathering expert-labeled motion data.
The authors annotate a dataset of 3,528 preference pairs generated by the MotionGPT model and investigate various algorithms for learning from this preference data, including Reinforcement Learning with Human Feedback (RLHF) and Direct Preference Optimization (DPO).
The results show that models trained with preference data, particularly using the DPO approach, significantly outperform the original MotionGPT baseline in terms of alignment metrics, while maintaining comparable quality. Human evaluation also confirms that the outputs from the preference-trained models are preferred over the original MotionGPT generations.
The authors further analyze the impact of the quantity and quality of the preference data, finding that samples with a higher degree of preference provide the most significant performance gains. They also highlight the importance of proper regularization techniques, such as the use of LoRA, in the success of the DPO approach.
Overall, this work demonstrates the potential of preference learning to enhance text-to-motion generation models and paves the way for further research in this direction, particularly in the context of limited data resources.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Jenny Sheng,... lúc arxiv.org 04-16-2024
https://arxiv.org/pdf/2404.09445.pdfYêu cầu sâu hơn