Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Comprehensive Evaluation
Multimodal foundation models like CLIP show robustness under natural distribution shifts but fail to improve robustness under synthetic distribution shifts and adversarial attacks.