ALARM introduces a framework for aligning large language models with human preferences through hierarchical rewards modeling in reinforcement learning.
The author introduces ALARM, a framework that hierarchically models rewards in reinforcement learning from human feedback to enhance alignment of large language models with human preferences.