Improved Baselines for Data-efficient Perceptual Augmentation of Large Language Models
Large language models can be efficiently interfaced with perceptual backbones to improve performance on multimodal tasks, with a focus on data and parameter efficiency.