Preventing Reward Hacking with Occupancy Measure Regularization: Theory and Practice
Occupancy measure regularization is proposed as a superior method to prevent reward hacking compared to action distribution regularization, supported by theoretical analysis and empirical evidence.