المفاهيم الأساسية
Bootstrapped Preference Optimization effectively mitigates bias in Multimodal Large Language Models, enhancing performance.
الملخص
Multimodal Large Language Models (MLLMs) often exhibit biases towards pretraining corpus, hindering visual grounding. Bootstrapped Preference Optimization (BPO) addresses this by learning preferences from negative responses. Distorted images and text-based LLM are used to construct a preference dataset for preference learning. BPO significantly improves model grounding in visual inputs, advancing multimodal conversational systems. The approach outperforms baselines across benchmarks, showcasing enhanced performance. Extensive experimentation validates the effectiveness of BPO in suppressing biases and improving model performance.
الإحصائيات
Extensive experimentation demonstrates significant performance improvements across multiple benchmarks.
BPO effectively suppresses pretrained LLM bias, enabling enhanced grounding in visual inputs.
The DPO algorithm has emerged as a promising alternative to RLHF due to its stability and competitive performance.
Our approach leads to significant performance improvements across multiple benchmarks and advancing the state-of-the-art in multimodal conversational systems.