By localizing and editing specific weights in language models, it is possible to control and mitigate encoded gender stereotypes while preserving the model's overall performance.


coremsg

local-contrastive-editing-for-controlling-gender-stereotypes-in-language-models


Local Contrastive Editing for Controlling Gender Stereotypes in Language Models


title_rewrite


AXOLOTL introduces a novel post-processing framework for debiasing Large Language Model outputs, ensuring fairness and performance preservation.


axolotl-fairness-through-assisted-self-debiasing-of-large-language-model-outputs


AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs



DAFAIR proposes a novel approach to mitigate social bias in language models without relying on demographic information, achieving competitive performance while reducing bias.


mitigating-social-bias-in-language-models-without-demographic-information


Mitigating Social Bias in Language Models without Demographic Information