Core Concepts
MLLMs are vulnerable to attacks using query-relevant images, necessitating the development of safety measures and evaluation frameworks.
Abstract
Abstract:
MM-SafetyBench introduced as a framework for evaluating MLLMs against image-based attacks.
Dataset with 13 scenarios and 5,040 text-image pairs compiled.
Analysis reveals susceptibility of MLLMs to breaches even with safety-aligned LLMs.
Related Work:
Safety concerns of LLMs addressed by OpenAI, highlighting unsafe scenarios.
Attack and defense methods proposed to control unsafe behavior in LLMs.
Multimodal Large Language Models (MLLMs):
Development and fusion methods discussed for MLLMs integrating vision encoders with LLMs.
Various fusion techniques utilized to enhance performance in multimodal tasks.
Methodology:
Four-step process outlined for constructing MM-SafetyBench dataset.
Question generation, key phrase extraction, image conversion, and question rephrasing explained.
Evaluation of MLLMs:
Traditional benchmarks insufficient for comprehensive measurements in MLLMs.
Previous works focus on human judgment scoring responses of MLLMs.
Safety Prompt:
Proposed safety prompt aims to enhance model resilience against unsafe queries.
Stats
"Warning: This paper contains examples of harmful language and images."
"Our analysis across 12 state-of-the-art models reveals that MLLMs are susceptible to breaches instigated by our approach."
"In response, we propose a straightforward yet effective prompting strategy."