多言語基盤モデルの評価において、SeaEvalは言語理解、文化理解、論理推論の能力を包括的に検証する。
Large language models can be effective user simulators for conversational recommendation, but they may exhibit deviations from human behavior that can be reduced with model selection and prompting strategies.