This survey presents a comprehensive overview of the current state of research on fairness in large language models (LLMs). It begins by introducing the fundamentals of LLMs and the factors that contribute to bias in these models, such as training data bias, embedding bias, and label bias.
The survey then delves into the definitions of fairness in machine learning, including group fairness and individual fairness, and discusses the necessary adaptations to address linguistic challenges when defining bias in the context of LLMs.
Next, the survey categorizes and discusses various metrics for quantifying bias in LLMs, including embedding-based metrics, probability-based metrics, and generation-based metrics. These metrics provide a systematic approach to measuring and assessing bias in LLMs.
The survey then presents a detailed review of algorithms for mitigating bias in LLMs, categorizing them into four stages: pre-processing, in-training, intra-processing, and post-processing. These techniques aim to address bias at different points in the LLM workflow, ranging from data augmentation and prompt tuning to loss function modification and model editing.
Furthermore, the survey compiles and summarizes the available resources for evaluating bias in LLMs, including toolkits and datasets. These resources are categorized based on their suitability for different types of bias metrics, providing a comprehensive reference for researchers and practitioners.
Finally, the survey discusses the current challenges and future research directions in the field of fairness in LLMs, such as formulating fairness notions, balancing performance and fairness, and developing more tailored datasets.
Overall, this survey offers a valuable and comprehensive resource for understanding the current state of research on fairness in LLMs, and it provides a roadmap for future advancements in this important and rapidly evolving field.
To Another Language
from source content
arxiv.org
Дополнительные вопросы