SALAD-Bench is a novel safety benchmark designed to evaluate Large Language Models (LLMs) comprehensively. It transcends conventional benchmarks by offering a large scale, diverse taxonomy spanning three levels. The benchmark includes standard and complex questions enriched with attack and defense modifications. An innovative evaluator, the LLM-based MD-Judge, ensures reliable evaluations with a focus on attack-enhanced queries. SALAD-Bench extends beyond standard safety evaluation to assess both LLM attack and defense methods effectively. The benchmark's extensive experiments shed light on LLM resilience against emerging threats and the efficacy of defense tactics.
In eine andere Sprache
aus dem Quellinhalt
arxiv.org
Wichtige Erkenntnisse aus
by Lijun Li,Bow... um arxiv.org 03-05-2024
https://arxiv.org/pdf/2402.05044.pdfTiefere Fragen