A Bayesian Approach to Robust Inverse Reinforcement Learning for High-Dimensional Continuous Control
The core message of this paper is that a Bayesian approach to model-based inverse reinforcement learning (BM-IRL) can lead to robust policies by simultaneously estimating the expert's reward function and their internal model of the environment dynamics. This is achieved by incorporating a prior that encodes the accuracy of the expert's dynamics model, which encourages the learner to plan against the worst-case dynamics outside the offline data distribution.