Evaluating Differential Treatment in Language Models through Paired Perturbations
FairPair, a robust evaluation framework, measures differential treatment in language models by constructing counterfactual pairs grounded in the same demographic group and accounting for inherent generation variability.