SALAD-Bench is a novel safety benchmark designed to evaluate Large Language Models (LLMs) comprehensively. It transcends conventional benchmarks by offering a large scale, diverse taxonomy spanning three levels. The benchmark includes standard and complex questions enriched with attack and defense modifications. An innovative evaluator, the LLM-based MD-Judge, ensures reliable evaluations with a focus on attack-enhanced queries. SALAD-Bench extends beyond standard safety evaluation to assess both LLM attack and defense methods effectively. The benchmark's extensive experiments shed light on LLM resilience against emerging threats and the efficacy of defense tactics.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Lijun Li,Bow... at arxiv.org 03-05-2024
https://arxiv.org/pdf/2402.05044.pdfDeeper Inquiries