Training Large Language Models to Prioritize Privileged Instructions and Maintain Robust Behavior
Large language models should prioritize privileged instructions from trusted sources over lower-priority instructions to maintain robust and secure behavior, even in the face of adversarial prompt injections.