Multimodal Representation Learning with Alternating Unimodal Adaptation to Address Modality Laziness
Multimodal learning often suffers from modality laziness, where some modalities dominate others during optimization. MLA addresses this by decomposing the joint multimodal optimization into an alternating unimodal learning process, while simultaneously capturing cross-modal interactions through a shared head with a gradient modification mechanism to prevent forgetting.