Evaluating Theory of Mind in Language Models Using Natural Dialogs
The author introduces a new dataset, COMMON-TOM, based on natural spoken dialogs to assess language models' Theory of Mind capabilities. By incorporating beliefs explicitly, the study shows improvements in LM performance.