The author introduces a new parameter-efficient multimodal tuning strategy, Multimodal Infusion Tuning (MiT), to integrate diverse modalities into large language models efficiently.