Jia, J., Yuan, Z., Pan, J., McNamara, P. E., & Chen, D. (2024). Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context. 38th Conference on Neural Information Processing Systems (NeurIPS 2024). arXiv:2406.05972v2 [cs.AI] 1 Nov 2024.
This research paper aims to develop a comprehensive framework for evaluating the decision-making behavior of large language models (LLMs) under uncertainty, particularly when presented with socio-demographic information. The study investigates whether LLMs exhibit human-like decision-making patterns and explores potential biases related to demographic features.
The researchers designed a series of multiple-choice-list experiments based on behavioral economics theories, specifically the Tanaka, Camerer, and Nguyen (TCN) model. These experiments were designed to assess three key parameters of decision-making under uncertainty: risk preference (σ), probability weighting (α), and loss aversion (λ). The researchers tested three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro. The LLMs were presented with lottery games in two contexts: context-free and embedded with socio-demographic features.
The study concludes that LLMs demonstrate human-like decision-making patterns but also exhibit variations and potential biases when socio-demographic features are introduced. This highlights the need for careful consideration of the ethical implications of using LLMs in decision-making scenarios, especially those involving diverse user groups.
This research provides a novel framework for evaluating LLM decision-making behavior using established behavioral economics principles. The findings contribute to a deeper understanding of the capabilities and limitations of LLMs in complex decision-making contexts, emphasizing the importance of addressing potential biases to ensure fairness and ethical deployment.
The study acknowledges limitations in directly comparing LLM behavior to human behavior due to the sensitive nature of certain demographic features. Future research could explore LLM decision-making in diverse domains beyond financial scenarios and investigate methods for mitigating biases while preserving the utility of LLMs in real-world applications.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Jingru Jia, ... at arxiv.org 11-04-2024
https://arxiv.org/pdf/2406.05972.pdfDeeper Inquiries