Learning Reward Functions and Discount Factors from Experts with Multiple Planning Horizons
We develop algorithms to jointly learn a global reward function and agent-specific discount factors from expert demonstrations with different planning horizons.