แนวคิดหลัก
This research demonstrates that by incorporating simple distance-based reward terms into a reinforcement learning framework, humanoid robots can be trained to navigate obstacle-ridden environments while performing bipedal locomotion.
สถิติ
The robot used in the simulations, JVRC-1, is 172cm tall, weighs 62kg, and has 34 degrees of freedom.
The policy was trained for approximately 10 hours on 96 million samples.
The learning episode length was set to 400 iterations.
The weight for the distance to the destination was set to 0.95.
The weight for the distance to each obstacle was set to -0.2.
The weight for the distance to the initial position was set to -0.5.