Unlearning Harmful Knowledge in Large Language Models to Enhance Jailbreaking Defense
Eraser, a novel defense method, aims to unlearn harmful knowledge in large language models, retain general knowledge, and maintain safety alignment, effectively reducing jailbreaking risks without compromising model capabilities.