Bibliographic Information: Shukor, M., & Cord, M. (2024). Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs. Advances in Neural Information Processing Systems (NeurIPS), 38.
Research Objective: This paper investigates how frozen LLMs generalize to multimodal inputs, focusing on the internal representations of these models.
Methodology: The researchers analyze the internal representations of frozen LLMs (specifically Vicuna-v1.5-7B) when exposed to image, video, audio, and text inputs. They employ single-task (ST) and multitask (MT) fine-tuning setups, using datasets spanning various modalities. The analysis tools include cosine similarity, token norm calculations, vocabulary distribution analysis, and subnetwork activation mapping.
Key Findings:
Main Conclusions:
Significance:
Limitations and Future Research:
To Another Language
from source content
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Mustafa Shuk... lúc arxiv.org 10-08-2024
https://arxiv.org/pdf/2405.16700.pdfYêu cầu sâu hơn