Evaluating Pronoun Use Fidelity in Large Language Models: Assessing Reasoning, Repetition, and Bias
Large language models struggle to robustly and faithfully reason about pronouns, even in simple settings, and continue to amplify discrimination against users of certain pronouns.