Core Concepts
This work generalizes inverse kinematics techniques to learn agent-centric state representations in high-dimensional, finite-memory Partially Observable Markov Decision Processes (FM-POMDPs).
Abstract
The paper addresses the challenge of learning informative, agent-centric state representations in high-dimensional, non-Markovian environments, where the current observation alone is insufficient to decode the relevant state.
The key contributions are:
Establishing that naive generalizations of inverse kinematics to the FM-POMDP setting can fail, both theoretically and empirically.
Proposing a generalized inverse model, Masked Inverse Kinematics with Actions (MIK+A), that can provably recover the agent-centric state representation under assumptions of past and future decodability.
Providing a thorough empirical evaluation of the proposed objectives on partially observable navigation tasks, demonstrating the superiority of MIK+A in recovering the agent-centric state.
Showing the effectiveness of the proposed objectives in learning useful representations for visual offline reinforcement learning tasks with partial observability.
The paper establishes that MIK+A, which leverages both past and future observation sequences along with action information, is the most effective approach for discovering the complete agent-centric state representation in FM-POMDPs.
Stats
"The agent-centric dynamics are deterministic and the diameter of the control-endogenous part of the state space is bounded."
"We assume our action space is finite and our agent-centric state space is also finite."
Quotes
"Discovering an informative, or agent-centric, state representation that encodes only the relevant information while discarding the irrelevant is a key challenge towards scaling reinforcement learning algorithms and efficiently applying them to downstream tasks."
"We establish that generalized inverse models can be adapted for learning agent-centric state representation for this task."
"Our results show that our variant of the multi-step inverse model (Lamb et al., 2022) can indeed succeed in the FM-POMDP setting."