The PRISM alignment dataset is a new resource for understanding human preferences and their role in aligning large language models (LLMs). It consists of two main components:
Survey: Participants complete a survey where they provide details about their demographics, familiarity with LLMs, stated preferences for model behaviors, and self-written descriptions of their values and beliefs. This maps the characteristics and preferences of 1,500 diverse participants from 75 countries.
Conversations: Participants then engage in live, multi-turn conversations with 21 different LLMs, rating the responses on a fine-grained scale and providing open-ended feedback. This links the participant profiles to their contextual preferences and interactions with the models.
The key features of PRISM are:
Participatory: It seeks to diversify the voices contributing to alignment norms by recruiting a global sample of participants with informed consent and fair pay.
Representative: It includes census-representative samples for the UK and US to understand collective welfare, as well as a diverse set of 21 LLMs from various commercial providers and open-access channels.
Individualized: It links each preference rating to a unique participant profile, allowing the exploration of personalization and the attribution of sample artifacts.
Subjective: It focuses on collecting conversations around value-laden and controversial topics, where interpersonal and cross-cultural disagreement is expected.
Multicultural: It places an extra emphasis on sourcing global participation, with English-speakers born in 75 different countries.
The authors demonstrate the usefulness of PRISM through three case studies: (1) Dialogue Diversity, examining how different people initiate different discussions with LLMs; (2) Preference Diversity, exploring how model preferences vary across idiosyncratic factors, context, and group affiliation; and (3) Welfare Outcomes, showing that larger and more representative juries lead to better societal welfare distributions, especially for minority groups.
PRISM provides a valuable resource for engineers, social scientists, and policymakers to navigate the complexities of human-AI interactions and the adjudication of alignment norms.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문