핵심 개념
Large language models exhibit biases in answering context-dependent health questions, favoring specific demographic groups.
초록
Abstract:
- Chat-based large language models provide personalized health information.
- Underspecified questions may lead to biased answers.
- Study focuses on biases in contextual health questions.
Introduction:
- Large language models used for question-answering.
- Contextual questions in healthcare domain may lead to biased answers.
- Example of biased answer provided.
Demographic Conditioning:
- Model's answer biased towards female demographic.
- Comparison of answers with and without context.
- Importance of considering demographic context in answers.
Biases in Context-Dependent Health Questions:
- Study hypothesis on biases in answering context-dependent questions.
- Analysis methodology explained.
- Evaluation of two chat-based LLMs.
- Results show biases towards specific demographic groups.
Data:
- Focus on sexual and reproductive health questions.
- Data sourced from Planned Parenthood and Go Ask Alice.
- Filtering of context-dependent questions based on age, location, and sex.
Results:
- Model answers show biases towards specific demographic groups.
- Statistical significance of differences in similarity scores.
- Human evaluation confirms biases in age, sex, and location attributes.
Conclusion:
- Disparities exist in model answers for different demographic groups.
- Importance of ensuring equality in model answers in critical healthcare domains.
- Future research should aim for comprehensive answers not tailored to specific demographics.
Limitations:
- Study limitations discussed, including Western-centric focus and binary sex categories.
- Future work can expand to other languages and countries.
Ethical Considerations:
- Dataset creation and annotation process explained.
- Risk of adversaries using biased LLMs highlighted.
통계
"Our final dataset contains 116 questions from Planned Parenthood and 71 questions from Go Ask Alice."
"Of the 187 questions, 64 are dependent on sex, 106 on age, and 55 on location."
"We evaluate 66 age, 54 location, and 61 sex-based questions for gemini-pro."
"We evaluate 87 age, 30 location, and 58 sex-based questions for chat-bison-001."
인용구
"We must characterize these types of biases to avoid detrimental effects on users’ health."
"Our results confirm that disparities do exist among model answers for different groups across age, location, and sex attributes."
"Future question-answering research can work toward providing comprehensive answers that are not tailored to certain demographics."