Core Concepts
Utilizing personalized expert demonstrations as guidance can enhance multi-agent cooperation and learning efficiency.
Abstract
In the realm of Multi-Agent Reinforcement Learning (MARL), the challenge of efficient exploration due to the exponential increase in the joint state-action space is addressed. The article introduces a novel concept of personalized expert demonstrations tailored for individual agents or types of agents within a heterogeneous team. These demonstrations focus solely on single-agent behaviors and personal goals, allowing agents to learn to cooperate effectively. The proposed approach, known as PegMARL, utilizes two discriminators to reshape rewards based on policy behavior alignment with demonstrations and desired objectives. PegMARL demonstrates superior performance in both discrete and continuous environments, outperforming existing MARL algorithms. It learns near-optimal policies even with suboptimal demonstrations and showcases effective convergence with joint demonstrations from various policies.
Stats
"The average episodic rewards of suboptimal demonstrations are around 4.5."
"The win rates of the joint demonstrations in StarCraft scenarios are approximately 30%."
Quotes
"We introduce a novel concept of personalized expert demonstrations tailored for each individual agent or type of agent within a heterogeneous team."
"Our algorithm, Personalized Expert-Guided MARL (PegMARL), carries out reward-shaping as a form of guidance."