SALAD-Bench is a novel safety benchmark designed to evaluate Large Language Models (LLMs) comprehensively. It transcends conventional benchmarks by offering a large scale, diverse taxonomy spanning three levels. The benchmark includes standard and complex questions enriched with attack and defense modifications. An innovative evaluator, the LLM-based MD-Judge, ensures reliable evaluations with a focus on attack-enhanced queries. SALAD-Bench extends beyond standard safety evaluation to assess both LLM attack and defense methods effectively. The benchmark's extensive experiments shed light on LLM resilience against emerging threats and the efficacy of defense tactics.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Lijun Li,Bow... alle arxiv.org 03-05-2024
https://arxiv.org/pdf/2402.05044.pdfDomande più approfondite