المفاهيم الأساسية
The core message of this article is to propose a novel approach called Logical Specifications-guided Dynamic Task Sampling (LSTS) that learns a set of reinforcement learning policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions.
الملخص
The article presents a framework for dynamic task sampling for reinforcement learning (RL) agents using high-level SPECTRL objectives and a Teacher-Student learning strategy. The key insights are:
The authors define a set of sub-tasks based on the edges of the directed acyclic graph (DAG) representation of the SPECTRL objective. Each sub-task corresponds to a reach-avoid objective for the agent.
They employ a Teacher-Student learning approach, where the Teacher agent uses its high-level policy to actively sample a sub-task for the Student agent to explore, with the goal of satisfying the high-level objective in the fewest number of environmental interactions.
The Student agent explores the environment for a few interactions, updating its low-level RL policy for the sampled sub-task. The Teacher observes the Student's performance and updates its high-level policy accordingly.
The authors introduce a method to discard unpromising sub-tasks, saving costly interactions and converging to a successful policy faster.
They also propose an extension called LSTSct, which further improves sample efficiency by continuing exploration on a new sub-task once the sub-task goal state is reached.
The authors evaluate LSTS and LSTSct on a gridworld domain, two simulated robotic tasks, and a complex search-and-rescue scenario. The results show that LSTS outperforms state-of-the-art automaton-guided RL baselines in terms of sample efficiency and time-to-threshold performance.
الإحصائيات
The article does not contain any explicit numerical data or statistics. The key figures are the learning curves and performance metrics reported in the experiments section.
اقتباسات
There are no direct quotes from the article that are particularly striking or support the key arguments.