Offline Inverse Reinforcement Learning: Maximizing Likelihood for Expert Behavior Recovery
The author proposes a bi-level optimization approach to estimate rewards accurately from expert demonstrations, addressing challenges in offline IRL. The algorithm outperforms existing benchmarks by recovering high-quality reward functions.