Handling Reward Misspecification by Aligning Agent Behavior with User Expectations
The core message of this paper is to introduce a formal framework called Expectation Alignment (EAL) to understand and address the problem of reward misspecification, where the agent's optimal behavior does not align with the user's underlying expectations.