Concepts de base
A leader-follower approach using a PID-controlled leader drone and a deep reinforcement learning-based follower drone can effectively handle the challenge of time-varying center of gravity in cooperative multi-drone transport systems.
Résumé
The paper presents a novel approach for controlling a multi-drone cooperative transport (CT) system in the presence of a time-varying center of gravity (CG). The system uses a leader-follower architecture, where the leader drone employs a traditional PID controller, while the follower drone utilizes a deep reinforcement learning (RL) controller based on the Soft Actor-Critic (SAC) algorithm.
Key highlights:
- The leader drone uses a PID controller to navigate the object to the desired goal position, while the follower drone uses a deep RL-based SAC controller to handle the time-varying CG.
- The follower drone's controller relies on local measurements (position, velocity) and minimal leader/object information, avoiding the need for inter-drone communication.
- The proposed deep RL-based approach is compared with an adaptive controller in simulation, showing better performance in terms of reduced oscillations and faster settling time for the object's position.
- The deep RL controller is also tested for different CG speeds and object mass variations, demonstrating its ability to handle such uncertainties.
- Preliminary experimental results on a two-drone CT system with a ball balance setup validate the effectiveness of the proposed approach.
The key contribution of this work is the development of a decentralized deep RL-based controller for the follower drone that can effectively handle the challenge of time-varying CG in multi-drone CT systems, without requiring detailed system dynamics or inter-drone communication.
Stats
The paper presents the following key data:
The mass of the leader drone (ml = 1kg), follower drone (mf = 1kg), and the object (mo = 0.2kg).
The length of the object (len = 0.34m) and the distance between the drone's geometric center and the rotor center (L = 0.12m).
The PID gains for the leader drone: kp = [0.5, 0.5, 0.5]^T, kd = [1, 1, 1]^T, and kI = [0, 0, 0]^T.
The SAC RL agent's hyperparameters: learning rate (0.0003), discount factor (γ = 0.99), entropy coefficient (α = 0.3), and replay buffer size (1,000,000).
Citations
"The proposed deep RL-PID system works better than an adaptive system with lesser oscillations, peaks, and better settling time for the proposed method."
"The proposed RL controller and the overall system can stably tolerate CG speed (slow to fast) and object mass variations (light and heavy)."