Kernekoncepter
The proposed SIDM framework leverages structural information principles to achieve hierarchical state and action abstraction, enabling efficient decision-making in complex environments through skill-based learning and role-based collaboration.
Resumé
The paper presents a novel decision-making framework called SIDM that is grounded in structural information principles. The key highlights are:
-
Graph Construction:
- Constructs homogeneous, weighted, undirected graphs for states and actions by measuring feature similarities.
- Applies edge filtration to eliminate trivial weights and simplify the graphs.
-
Hierarchical Abstraction:
- Initializes an encoding tree for each graph and minimizes its structural entropy to obtain community partitioning.
- Designs an aggregation function using assigned entropy as vertex weights to achieve hierarchical abstractions of states and actions.
-
Skill Identification:
- Extracts abstract elements from historical trajectories to construct a directed, weighted, homogeneous transition graph.
- Defines and optimizes high-dimensional structural entropy for the directed graph to generate an optimal encoding tree.
- Calculates the common path entropy to quantify the occurrence probability of each abstract transition, enabling an adaptive skill-based learning mechanism.
-
Abstract MDP and Learning:
- Formulates an abstract MDP by mapping original states to abstract states using the hierarchical abstraction.
- Introduces a two-layer skill-based learning mechanism that operates independently of expert knowledge.
- Extends the framework to multi-agent scenarios, utilizing the hierarchical action abstraction to enable automatic role-based learning.
The proposed SIDM framework is evaluated on a wide range of benchmarks, including visual gridworld navigation, continuous control tasks, robotic control, and StarCraft II micromanagement. The results demonstrate that SIDM significantly improves the quality, stability, and sample efficiency of policies compared to state-of-the-art RL algorithms.