The content discusses the opinion leadership of large language models (LLMs) in the context of the Werewolf game. It highlights that while LLMs have demonstrated strategic behaviors in social deductive games, their potential opinion leadership has been overlooked.
The key points are:
The Werewolf game is used as a simulation platform to assess the opinion leadership of LLMs. The game features a "Sheriff" role, which is tasked with summarizing arguments and recommending decision options, making it a credible proxy for an opinion leader.
Two novel metrics are introduced to evaluate the opinion leadership of LLMs:
Extensive experiments are conducted to evaluate LLMs of different scales, and a Werewolf question-answering dataset (WWQA) is collected to assess and enhance LLMs' understanding of the game rules.
The results suggest that the Werewolf game is a suitable test bed to evaluate the opinion leadership of LLMs, but few LLMs possess the capacity for opinion leadership. Larger-scale LLMs generally perform better, but improving the opinion leadership of LLMs remains a non-trivial task.
Human evaluation experiments are also conducted, which show that LLMs can gain the trust of human players but struggle to influence their decisions.
翻譯成其他語言
從原文內容
arxiv.org
深入探究