核心概念
RoleInteract introduces a benchmark to evaluate the social interaction of role-playing conversational agents, highlighting the importance of assessing both individual and group levels.
要約
RoleInteract evaluates social intelligence in role-playing agents, emphasizing individual and group dynamics. The benchmark includes diverse sources, covering 500 characters with 6,000 questions and 30,800 utterances. Evaluation metrics focus on self-awareness, emotional perception, conversation memory, and social preference. Mainstream LLMs are assessed for performance.
Introduction:
Large language models have enhanced AI conversational agents.
Role-specific knowledge evaluation is crucial.
RoleInteract aims to assess social intelligence in role-playing agents.
Individual Level Assessment:
Self-awareness on role description is essential for consistency.
Emotional perception aids in understanding others' emotions.
Long-term conversation memory enhances reliability.
Group Level Assessment:
Social preference towards group dynamics influences behavior.
Complex interactions within groups impact agent behavior.
Data Construction:
Dialogue construction methods ensure fluency and fidelity.
Question design tailored to dimensions like self-awareness and emotional perception.
Experiment Settings:
Metrics like accuracy and keyword coverage rate used for evaluation.
Results and Analysis:
Closed-source models outperform open-source ones.
Performance declines with longer conversations and complex group dynamics.
Conclusion:
RoleInteract provides a comprehensive evaluation framework for assessing sociality in role-playing agents at individual and group levels.
統計
Large language models have advanced the development of various AI conversational agents.
The benchmark covers 500 characters with over 6,000 question prompts and 30,800 multi-turn role-playing utterances.
Agents excelling at the individual level may not perform well at the group level.