This study explores how three prominent LLMs - GPT-4, ChatGPT, and Llama2-70B-Chat - perform ethical reasoning in different languages and whether their moral judgments depend on the language of the prompt. The authors extend the work of Rao et al. (2023) on ethical reasoning of LLMs to a multilingual setup, using six languages: English, Spanish, Russian, Chinese, Hindi, and Swahili.
The key findings are:
GPT-4 is the most consistent and unbiased ethical reasoner across languages, while ChatGPT and Llama2-70B-Chat show significant moral value bias when prompted in languages other than English.
The nature of this bias varies significantly across languages for all LLMs, including GPT-4.
Across all languages, GPT-4 has the highest ethical reasoning ability, while Llama2-70B-Chat has the poorest.
Across all models, the reasoning is poorest for Hindi and Swahili, and best for English and Russian.
English and Spanish, as well as Hindi and Chinese, exhibit similar bias patterns across the models.
The study highlights the complex interplay between language, culture, and values in the ethical reasoning of LLMs, and the need to carefully consider these factors when deploying such models in real-world applications.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Utkarsh Agar... في arxiv.org 04-30-2024
https://arxiv.org/pdf/2404.18460.pdfاستفسارات أعمق