(N, K)-Puzzle: Benchmarking Reinforcement Learning Algorithms in Language Models
The author introduces the (N, K)-Puzzle as a cost-effective testbed to evaluate RL algorithms in generative language models, aiming to bridge the gap in standardized evaluation methods.