The paper presents Inference-Time Rule Eraser, a framework for debiasing deployed machine learning models without modifying the model parameters. The key insights are:
The authors analyze the Bayesian interpretation of the deployed model's output and the desired fair model's output, identifying that the difference lies in the biased rules learned by the deployed model.
They derive the Inference-Time Rule Eraser framework, which shows that the biased rules can be removed by subtracting the logarithmic value associated with the biased rules from the model's logits output.
Since the biased rules are not directly accessible from the model's output, the authors propose a two-stage approach:
Extensive experiments on various datasets demonstrate the effectiveness of the proposed method in debiasing deployed models, outperforming existing fairness-aware training and post-processing techniques. The method is also shown to be effective in addressing issues related to spurious prediction rules, such as out-of-distribution problems.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Yi Zhang,Jit... at arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.04814.pdfDeeper Inquiries