Toxicity prediction models are vulnerable to small adversarial perturbations that can fool them into misclassifying toxic content as benign.