Provably Efficient Interactive Learning from Hindsight Instruction Feedback
This work initiates the theoretical analysis of interactive learning with hindsight instruction feedback, where an agent generates a response and receives an instruction that is most suitable for the agent's response, rather than expert supervision or rewards.