Core Concepts
Agents can solve complex tasks using skill machines that leverage skill primitives and reward machines.
Abstract
The paper introduces the concept of skill machines for solving tasks through temporal logic composition. It addresses challenges in reinforcement learning, such as sample efficiency and task generalization. By combining learned skills logically and temporally, agents can achieve near-optimal behaviors zero-shot. The proposed framework involves learning a set of skill primitives to compose high-level goals in various environments. The study demonstrates the effectiveness of skill machines in tabular settings, video games, and continuous control environments. Additionally, it shows how off-policy reinforcement learning algorithms can enhance the performance of skill machines when optimal behaviors are desired.
Stats
Published at ICLR 2024
Regular fragments of linear temporal logic used
Demonstrated experimentally in tabular setting, video game, and continuous control environment
Improvement with regular off-policy reinforcement learning algorithms shown
Finite state machines encode solutions to any task specified using regular language
Zero-shot spatial and temporal composition achieved
Capable of mapping from complex temporal logic task specifications to near-optimal behaviors zero-shot
Empirical results surpass state-of-the-art baselines
Product MDPs used for tasks between environment and reward machine guarantee Markov rewards
Skill primitives defined for spatial curse of dimensionality addressed by WVFs for each proposition
Constraints introduced to augment state space for addressing temporal curse of dimensionality
Quotes
"An SM is defined by translating the regular language task specification into an FSM."
"We demonstrate this experimentally in a tabular setting, as well as in a high-dimensional video game and continuous control environment."
"We propose skill machines (SM), which are finite state machines (FSM) that encode the solution to any task specified using any given regular language."
"We particularly focus on temporal logic composition, such as linear temporal logic (LTL), allowing agents to sequentially chain and order their skills while ensuring certain conditions are always or never met."
"Our results indicate that our method is capable of producing near-optimal to optimal behavior for a variety of long-horizon tasks without further learning."