SALAD-Bench is a novel safety benchmark designed to evaluate Large Language Models (LLMs) comprehensively. It transcends conventional benchmarks by offering a large scale, diverse taxonomy spanning three levels. The benchmark includes standard and complex questions enriched with attack and defense modifications. An innovative evaluator, the LLM-based MD-Judge, ensures reliable evaluations with a focus on attack-enhanced queries. SALAD-Bench extends beyond standard safety evaluation to assess both LLM attack and defense methods effectively. The benchmark's extensive experiments shed light on LLM resilience against emerging threats and the efficacy of defense tactics.
toiselle kielelle
lähdeaineistosta
arxiv.org
Tärkeimmät oivallukset
by Lijun Li,Bow... klo arxiv.org 03-05-2024
https://arxiv.org/pdf/2402.05044.pdfSyvällisempiä Kysymyksiä