Training AI Agents to Be Shutdownable Using Stochastic Choice and Discounted Rewards
Training AI agents with a novel "Discounted REward for Same-Length Trajectories (DREST)" reward function can incentivize them to be both useful (effectively pursue goals) and neutral about their own shutdown, potentially solving the AI shutdown problem.