Omni-SMoLA: Efficient Mixture of Multimodal Experts for Boosting Generalist Large Language Models
Omni-SMoLA is an efficient architecture that uses a soft mixture of many low-rank multimodal experts to improve the performance of generalist large language models across a wide range of vision-and-language tasks, often matching or outperforming specialized models.