Concepts de base
Using digital twin technology combined with reinforcement learning can enhance a robot's adaptability to uncertain environments.
Résumé
The content explores the integration of digital twin technology and reinforcement learning to improve a robot's adaptability in uncertain environments. The paper introduces a self-improving online training framework that enables robots to generate collision-free trajectories in real-time. By utilizing a digital twin as a virtual counterpart of the physical system, robots can continuously update their policies based on real-world scenarios. The bidirectional communication between the digital and physical systems allows for hardware-in-the-loop RL training, enhancing the robot's ability to adapt to new environments. The proposed framework is demonstrated on the Unfactory Xarm5 collaborative robot, showcasing its capability for policy online training while leaving room for improvement.
I. INTRODUCTION
- Collaborative robots are increasingly important in Industry 5.0.
- Automation introduces complexity and unpredictability.
- Need for adaptive, flexible, and cost-effective robots is growing.
II. RELATED WORK
- Reinforcement learning applied to robotic manipulation.
- Focus on enhancing adaptability in dynamic environments.
- Importance of balancing human avoidance and task efficiency.
III. DIGITAL TWIN ONLINE TRAINING FRAMEWORK
A. RL based Obstacle Avoidance
- Markov Decision Process used for reinforcement learning.
- Definition of state space, action space, and reward function.
B. Object Detection and Localization
- Utilization of YOLOv8 for object detection and classification.
- Mapping objects from pixel values to Pybullet world coordinates.
C. Integrated Digital Twin
- Framework built upon Pybullet using OpenAI Gym.
- Bidirectional data transmission ensures synchronization between systems.
IV. EXPERIMENTS AND RESULTS
A. Task Description
- Obstacle avoidance case study conducted with Ufactory Xarm5 robot.
B. Agent Retain Training
- Pre-trained model updated with larger obstacle scenario.
- Retraining process demonstrates efficiency in adapting to new environment.
Stats
"The experiment suggest that proposed framework is capable of performing policy online training."
"The reward sharply drops when the re-trained model starts."
"After around 1.2×104 steps, the reward begins to rise."