The paper presents a framework for learning an end-to-end vision-based whole-body-control parkour policy for humanoid robots. The key highlights are:
The policy is trained using fractal noise terrain, which encourages foot raising without the need for explicit reward engineering, such as "feet air time" terms. This simplifies the reward function and allows the policy to learn diverse locomotion skills.
The policy is trained on a variety of parkour obstacles, including jumping up, leaping over gaps, traversing stairs, and overcoming hurdles. This enables the policy to autonomously select the appropriate parkour skill when encountering different challenges.
The policy is distilled from an oracle policy using a multi-GPU acceleration approach. This allows the student policy to achieve high performance while being deployable on the real humanoid robot with only onboard computation, sensing, and power support.
Experiments show the policy can perform challenging parkour tasks, such as jumping on 0.42m platforms, leaping over 0.8m gaps, and running at 1.8m/s in the wild. The policy is also shown to be robust to arm action override, enabling its use in mobile manipulation tasks.
翻譯成其他語言
從原文內容
arxiv.org
深入探究