The paper studies an inverse reinforcement learning (IRL) problem where experts are planning under a shared reward function but with different, unknown planning horizons. Without the knowledge of discount factors, the reward function has a larger feasible solution set, making it harder for existing IRL approaches to identify the reward function.
To address this challenge, the authors develop two algorithms:
Multi-Planning Horizon LP-IRL (MPLP-IRL):
Multi-Planning Horizon MCE-IRL (MPMCE-IRL):
The authors provide theoretical analyses on the identifiability of the reward function and discount factors. They show that with a sufficiently large number of experts, both the reward function and discount factors become identifiable.
Experiments on three domains demonstrate that the learned reward functions generalize well to similar tasks, and the algorithms converge quickly compared to exhaustive grid search.
To Another Language
from source content
arxiv.org
Дополнительные вопросы