The content provides a comprehensive analysis of the security implications associated with integrating image modalities into Multimodal Large Language Models (MLLMs). It begins by outlining the foundational components and training processes of MLLMs, highlighting how the inclusion of visual data can introduce new vulnerabilities.
The authors then construct a detailed threat model, categorizing the diverse vulnerabilities and potential attacks that can target MLLMs in different scenarios, including white-box, black-box, and gray-box attacks. The paper then reviews the current state-of-the-art attacks on MLLMs, classifying them into three primary categories: structure-based attacks, perturbation-based attacks, and data poisoning-based attacks.
The authors also discuss the existing defensive strategies, which can be divided into training-time defenses and inference-time defenses. These approaches aim to enhance the security and robustness of MLLMs against the identified threats.
Finally, the content discusses several unsolved problems and proposes future research directions, such as quantifying security risks, addressing privacy concerns, deepening research on multimodal security alignment, and leveraging interpretability perspectives to gain a better understanding of MLLM security issues.
Till ett annat språk
från källinnehåll
arxiv.org
Djupare frågor