The paper introduces STARC metrics to quantify differences between reward functions, addressing the challenges of evaluating reward learning algorithms. It establishes theoretical guarantees of soundness and completeness, showing that STARC metrics are both necessary and sufficient for low regret. Experimental results demonstrate the superior performance of STARC metrics compared to existing alternatives in various environments.
The paper critiques EPIC for not inducing relevant regret bounds and highlights shortcomings in DARD's canonicalisation function. It emphasizes the importance of considering policy ordering when measuring regret. Theoretical analysis shows that STARC metrics provide robust theoretical guarantees and outperform existing pseudometrics empirically.
Further research is suggested to determine the best-performing STARC metrics in practice, generalize results to continuous environments, explore weaker criteria for reward metric creation, and extend analysis to multi-agent settings.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Joar Skalse,... alle arxiv.org 03-12-2024
https://arxiv.org/pdf/2309.15257.pdfDomande più approfondite