Multimodal Models

Увійти

ідея - Multimodal Models

X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment Study

提案された多言語VIFデータセットの構築方法とX-LLaVAモデルの効率的な多言語拡張フレームワークを示す。

Optimizing Bilingual Large Vision-Language Alignment Study at Seoul National University

Proposing cost-effective methods for multilingual LMM training and dataset construction.

Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Comprehensive Evaluation

Multimodal foundation models like CLIP show robustness under natural distribution shifts but fail to improve robustness under synthetic distribution shifts and adversarial attacks.

Exploring GPT-4V as a Generalist Web Agent: SEEACT Study

GPT-4V shows promise as a generalist web agent with the SEEACT study, highlighting its potential for completing tasks on live websites.

Typographic Attacks in Large Multimodal Models: Alleviating with Informative Prompts

Large Multimodal Models face distractibility from typographic attacks, but can be mitigated by providing more informative prompts.

Про нас

Продукти

Ресурси