The paper analyzes the fairness of natural language representations, specifically sentence and document encodings, in the context of binary classification tasks. It focuses on two real-world datasets: the Hindi Legal Document Corpus (HLDC) for bail prediction and the Multilingual Twitter Corpus (MTC) for hate speech recognition.
The authors examine the fairness of two common encoding strategies, vector averaging and vector extrema, by analyzing the differences in the reconstruction errors of the principal components for different subgroups based on protected attributes (religion, ethnicity, and gender). They find that the vector averaging approach shows bias towards certain subgroups, while vector extrema is more fair in this regard.
To balance the trade-off between fairness and accuracy, the authors propose using a convex combination of the two encoding strategies. They provide recommendations on choosing an optimal combination based on the available leeway to compromise on accuracy in favor of stricter representation-level fairness requirements.
The key highlights of the paper are:
Ke Bahasa Lain
dari konten sumber
arxiv.org
Wawasan Utama Disaring Dari
by Biswajit Rou... pada arxiv.org 04-16-2024
https://arxiv.org/pdf/2404.09664.pdfPertanyaan yang Lebih Dalam