The paper presents a new dataset called "Social" to examine the ability of large language models (LLMs) to understand human social norms. The dataset consists of 12,383 high-quality multi-choice questions belonging to 402 skills, covering a wide range of social norms including rules, laws, culture, history, and communication.
The authors evaluate the performance of state-of-the-art LLMs, including GPT3.5-Turbo and LLaMA2-Chat, on the Social dataset. The results show that recent advancements in LLMs, particularly the use of reinforcement learning with human feedback (RLHF), have significantly improved the models' ability to understand social norms. However, the best-performing LLMs are still slightly below the performance of average elementary students.
To further enhance LLMs' understanding of social norms, the authors propose a multi-agent framework called "SocialAgent". SocialAgent integrates three LLM agents: a retrieval agent to collect relevant web knowledge, a programming agent to perform symbolic reasoning, and a reasoning agent to trigger step-by-step logical thinking. The ensemble of these agents helps LLMs reach parity with human performance on the Social dataset.
The paper also provides a detailed analysis of the dataset, including the skill distribution, grade-level performance, and case studies. The findings suggest that while LLMs have made progress in understanding fundamental social norms, there is still significant room for improvement, especially in more advanced social norm skills.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문