Program synthesis benefits from value-based RL methods, showcasing stability and performance improvements.