Gradient Cuff proposes a two-step method to detect jailbreak attacks on large language models by exploring refusal loss landscapes.