The paper presents a case study on the stability of value expression in 21 Large Language Models (LLMs) from 6 different families. The authors evaluate two types of value stability: Rank-Order stability (on the population/interpersonal level) and Ipsative stability (on the individual/intrapersonal level).
The key findings are:
Mixtral, Mistral, Qwen, and GPT-3.5 model families exhibit higher value stability compared to LLaMa-2 and Phi families, both in terms of Rank-Order and Ipsative stability.
When instructed to simulate specific personas, LLMs exhibit low Rank-Order stability, which further diminishes with longer conversations. This highlights the need for future research on LLMs that can coherently simulate different personas over extended interactions.
The trends observed in value stability on the Portrait Values Questionnaire (PVQ) transfer to downstream behavioral tasks, with the most stable models on PVQ also exhibiting the highest stability on tasks like Donation, Religion, and Stealing.
For the more stable models, there is a correlation between the expression of values like Universalism, Benevolence, Power, and Achievement, and the simulated behavior on the Donation task. However, the overall correlations remain low, indicating room for improvement in aligning value expression and behavior.
The paper provides a foundational step towards understanding the context-dependence and stability of value expression in LLMs, which is crucial for their safe and ethical deployment.
Іншою мовою
із вихідного контенту
arxiv.org
Глибші Запити