SALAD-Bench is a novel safety benchmark designed to evaluate Large Language Models (LLMs) comprehensively. It transcends conventional benchmarks by offering a large scale, diverse taxonomy spanning three levels. The benchmark includes standard and complex questions enriched with attack and defense modifications. An innovative evaluator, the LLM-based MD-Judge, ensures reliable evaluations with a focus on attack-enhanced queries. SALAD-Bench extends beyond standard safety evaluation to assess both LLM attack and defense methods effectively. The benchmark's extensive experiments shed light on LLM resilience against emerging threats and the efficacy of defense tactics.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問