Inferring User Preferences from Demonstrations in Multi-Objective Reinforcement Learning
A dynamic weight-based preference inference (DWPI) algorithm that can accurately infer user preferences from demonstrations, including sub-optimal ones, in multi-objective reinforcement learning settings.