Exaggerated Safety Behaviors in Large Language Models: A Systematic Evaluation with XSTEST
Large language models often exhibit exaggerated safety behaviors, refusing to comply with clearly safe prompts due to an overemphasis on safety-related keywords and phrases, which limits their helpfulness.