This comprehensive survey examines the current landscape of biases in Large Language Models (LLMs). It systematically categorizes different types of biases, such as demographic biases (e.g., gender, race, age), contextual biases (e.g., domain-specific, cultural), and algorithmic biases. The survey analyzes the sources of these biases, which can stem from training data, model architecture, human annotation, user interactions, and broader societal influences.
The survey also evaluates the significant impacts of bias in LLMs, including social implications (e.g., perpetuating inequalities, ethical dilemmas), operational implications (e.g., performance degradation, user trust issues), and the need for robust bias detection and measurement techniques. Both qualitative and quantitative methods for bias evaluation are discussed, highlighting the importance of comprehensive, intersectional metrics and the need for transparency in model development.
The survey then reviews recent advancements in bias evaluation and mitigation strategies, including techniques such as prompt engineering, fine-tuning, and social contact-based debiasing. It also identifies current limitations and proposes future research directions, such as developing comprehensive lifecycle bias evaluation, intersectional and contextual bias mitigation, bias-aware training, and real-world impact assessment. Addressing these gaps will contribute to the creation of more fair and equitable AI systems.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Rajesh Ranja... às arxiv.org 09-26-2024
https://arxiv.org/pdf/2409.16430.pdfPerguntas Mais Profundas