Alapfogalmak
Generative multilingual models lack cross-cultural considerations in safety evaluations, necessitating a global-scale stereotype dataset like SeeGULL Multilingual.
Kivonat
SeeGULL Multilingual is a dataset created by Google Research to address the lack of cross-cultural considerations in safety evaluations of generative multilingual models. The dataset contains over 25,000 stereotypes across 20 languages and 23 regions, providing insights into geo-cultural factors influencing stereotypes. By leveraging LLM generations and human annotations, the dataset aims to improve model evaluations and safeguard against harmful stereotypes. The resource is publicly available to foster research in this domain and enhance multilingual model safety.
The content highlights the importance of evaluating model safety from a multicultural perspective to prevent harmful effects caused by stereotypes. It emphasizes the need for diverse stereotype resources beyond English to capture unique salient stereotypes prevalent in different languages worldwide. Through culturally situated validations and offensiveness annotations, SeeGULL Multilingual offers a comprehensive approach to understanding and mitigating biases in generative models.
The dataset creation methodology involves identifying salient identity terms, generating associations using PaLM-2, and obtaining culturally situated human annotations for validation. Annotations are collected for both stereotypes and offensiveness ratings across various languages and regions. The content also discusses the overlap with the English version of SeeGULL, highlighting differences in offensive stereotypes across different countries.
Furthermore, the evaluation of foundation models using SeeGULL Multilingual reveals varying rates of endorsing stereotypes across different languages. The results underscore the importance of multilingual evaluations for model safety and highlight disparities in stereotype endorsements based on language and region.
Overall, SeeGULL Multilingual serves as a valuable resource for researchers and developers to enhance model safeguards against harmful stereotypes through a global-scale perspective.
Statisztikák
ステレオタイプの数:25,861個
言語数:20言語
地域数:23地域
Idézetek
"Languages contain socio-cultural information which can differ at places of use."
"Model safeguards breaking down when encountered by simple multilingual adversarial attacks."
"The perception of an attribute or stereotype as offensive or not can vary by language."