Core Concepts
Implementation of reinforcement learning in robotic manipulator trajectory planning.
Abstract
This study focuses on implementing a reinforcement learning algorithm for trajectory planning in robotic manipulators. The article discusses the challenges faced in dynamic environments with moving obstacles and unknown dynamics. It explores the use of deep deterministic policy gradient (DDPG) algorithms and compares models' efficiency with dense and sparse rewards. The content is structured into sections covering methodology, implementation of DRL-based algorithms, simulation results, future study, and conclusion.
Abstract:
- Study on reinforcement learning algorithm for trajectory planning in manipulators.
- Utilization of 7-DOF robotic arm to pick & place objects in unknown environments.
Introduction:
- Robots used in various applications face uncertainties and safety concerns.
- Control strategies like impedance control and admittance control introduced complexities.
Methodology:
- Simulation of robotic manipulator for pick-and-place tasks considering unforeseen obstacles.
- Formulation of control strategies using sparse and dense rewards for obstacle avoidance.
Implementation of DRL Based Algorithm:
- RL enables optimization through trial and error guided by feedback from actions.
- Application of DDPG technique for trajectory planning and obstacle avoidance.
Simulation:
- Experimental setup on a 7-DOF fetch robotic arm with success rates analyzed for different scenarios.
- Results show better convergence with sparse rewards compared to dense rewards.
Future Study and Work:
- Suggestions to apply Graph Neural Networks (GNN) and Model Predictive Control (MPC) for dynamic obstacle avoidance problems.
Conclusion:
- Successful implementation of RL-based trajectory-finding algorithm in complex environments with emphasis on sparse rewards for efficient learning.
Stats
The timestamp to accomplish the task per episode is set to T = 100.
In case 1, the success rate is measured that the robot has accomplished the task.
The height of the target is sampled from a uniform distribution within a range of [0, 0.45] m.
Experiments conducted on a Linux machine equipped with an Intel(R) i9 12th Gen @ 2.40 GHz, 32 GB RAM, and NVIDIA 3080Ti GPU.
Each model was trained using the Adam optimizer.
Quotes
"Robots are commonly used in many applications such as picking and placing objects, welding, surgical, agricultural sectors." - Introduction
"Deep Deterministic Policy Gradient (DDPG) algorithm shows promising results for complex systems." - Implementation
"The model prioritized obstacle avoidance over picking the block in finite steps." - Conclusion