The author introduces CogBench, a benchmark utilizing cognitive psychology experiments to evaluate large language models. The study highlights the importance of model size and reinforcement learning from human feedback in improving performance.
Large language models are evaluated using behavioral metrics from cognitive psychology experiments in the CogBench benchmark.