المفاهيم الأساسية
A two-stage framework that leverages position-based imitation data and decaying action priors to accelerate the training of torque-based legged locomotion policies, enabling consistent convergence to high-quality gaits.
الملخص
The paper proposes a two-stage framework for training torque-based legged locomotion policies:
Position Policy Training:
The authors first train a position-based policy to acquire imitation data, eliminating the need for expert knowledge to design optimal controllers.
The tracked joint angles from the position policy's simulation are used as the imitation data.
Decaying Action Priors (DecAP):
In the second stage, the authors incorporate decaying action priors to enhance the exploration of torque-based policies.
A PID controller is used to translate the reference imitation angles into torque values, which are then integrated into the torque-based policy's actions with a gradual time-decay factor.
This provides an initial bias to guide the policy's exploration in the torque space, helping it converge to natural gaits.
The results show that the DecAP approach significantly accelerates learning in the torque space, enabling different types of robots to complete the velocity tracking task within 25 minutes of wall-clock time. DecAP consistently outperforms imitation-based approaches and demonstrates the advantages of torque as an action space by comparing a position-based policy to a position-assisted torque-based policy on a quadruped robot.
الإحصائيات
The paper does not provide any specific numerical data or statistics to extract. The key results are presented in the form of figures and qualitative comparisons.
اقتباسات
The paper does not contain any striking quotes that support the key logics.