Core Concepts
Exploiting hierarchical structures in feedback signals can enhance reward design and improve agent performance across various tasks.
Abstract
The content introduces HERON, a hierarchical reward modeling framework for reinforcement learning. It addresses challenges in reward design by leveraging hierarchical decision trees based on the importance ranking of feedback signals. HERON outperforms traditional reward engineering methods in traffic light control, code generation, classic control, and robotic control tasks. The framework demonstrates flexibility, robustness to environment changes, and superior performance compared to baselines.
Abstract:
Reward design challenges in RL.
Introduction of HERON framework.
Benefits of utilizing hierarchical structures in feedback signals.
Introduction:
Significance of deep reinforcement learning advancements.
Importance of reward function in benchmark environments.
Challenges in designing rewards for real-world environments.
Method:
Preference elicitation through trajectory comparisons.
Decision tree construction based on feedback signal hierarchy.
Reward learning using labeled dataset D of trajectories.
Experiment - Traffic Light Control:
Comparison with reward engineering baseline and ensemble approaches.
Evaluation of different reward hierarchies' impact on agent behavior.
Experiment - Code Generation:
Performance evaluation on APPS dataset using Pass@K metric.
Generalization testing on MBPP dataset.
More Experiments:
Classic Control experiments with mountain car and pendulum environments.
Robotic Control experiments with Ant, Half-Cheetah, and Hopper tasks.
Stats
"In traffic light control environment [Zhang et al., 2019], where 6 feedback signals have hierarchy: queue length > the average vehicle waiting time > other feedback signals."
"In code generation task [Le et al., 2022] is a sparse reward scenario."
Quotes
"HERON can not only train high performing agents on difficult tasks but also provide additional benefits such as improved sample efficiency and robustness."