Core Concepts
This study explores the application of Large Language Models (LLMs) in Jubensha games, introducing a dataset specific to this narrative environment. The authors aim to enhance AI agent development and evaluate their performance in complex interactive settings.
Abstract
In this study, the authors delve into the world of Jubensha games, a Chinese detective role-playing game, to explore the capabilities of Large Language Models (LLMs). They introduce a specialized dataset for Jubensha, design a multi-agent interaction framework using LLMs, and develop novel methods to assess AI agents' gaming performance. By incorporating in-context learning techniques, they aim to improve information gathering, murderer identification, and logical reasoning skills of LLM-based agents.
The research highlights the rise of AI in gaming landscapes and focuses on text-based games like Jubensha. It addresses challenges faced by AI agents tailored for such games and proposes innovative solutions to enhance their performance. Through empirical experiments and evaluations, the study showcases the effectiveness of their proposed methods in advancing LLM-based agents' capabilities within narrative-driven game environments.
Key metrics such as win rates for civilian players and murderer identification accuracy are used to evaluate different architectures of LLM-based agents. The study also emphasizes ethical considerations regarding portrayals of violence in fictional scenarios and limitations related to language specificity of datasets and model updates.
Stats
1Currently, this dataset is in Chinese, but we are open to expanding it to other languages in the future.
2We will release this dataset post-acceptance for academic purposes only.
Civilian Win Rate: 0.183 - 0.624 across different architectures.
Murderer Identification Accuracy: 0.194 - 0.654 across different architectures.