Guardrail Baselines for Unlearning in Large Language Models (LLMs)
In exploring unlearning methods for large language models, the authors propose guardrail-based approaches like prompting and filtering as viable alternatives to fine-tuning. They emphasize the need for evaluation metrics that distinguish between guardrails and fine-tuning.