Enhancing Large Language Model Defense Capabilities through a Multi-Agent Attacker-Disguiser Game
A multi-agent attacker-disguiser game framework is proposed to strengthen the ability of large language models to generate secure responses that disguise defensive intent, avoiding exploitation by malicious attackers.