Core Concepts
We propose MMCert, the first certified defense against adversarial attacks on multi-modal models. MMCert provides provable robustness guarantees by leveraging a modality-independent sub-sampling strategy.
Abstract
The paper proposes MMCert, a certified defense against adversarial attacks on multi-modal models. Key highlights:
Multi-modal models, which process input from multiple modalities (e.g., image, 3D points, audio, text), are vulnerable to adversarial attacks that can manipulate all input modalities.
Existing certified defenses are designed for unimodal models and achieve sub-optimal performance when extended to multi-modal models.
MMCert is the first certified defense for multi-modal models. It works by creating multiple sub-sampled versions of the multi-modal input and aggregating the predictions of a base multi-modal model on these sub-samples.
The paper derives provable robustness guarantees for MMCert, showing that it can provably make the same prediction for a multi-modal input even when the number of modified basic elements (e.g., pixels, audio frames) in each modality is bounded.
Experiments on multi-modal road segmentation and emotion recognition tasks show that MMCert significantly outperforms extending the state-of-the-art certified defense for unimodal models to the multi-modal setting.
The paper also analyzes the impact of the sub-sampling ratios across modalities and considers different attack types (modification, addition, deletion) on the multi-modal inputs.
Stats
The number of changed pixels in both RGB image and depth image is bounded.
The number of changed audio frames in both visual and audio modalities is bounded.