Core Concepts
Prompt Highlighter enables interactive control in multi-modal LLMs through token-level highlighting, enhancing generation outputs.
Abstract
The content introduces Prompt Highlighter, a method for user interaction in multi-modal LLMs. It focuses on controlling generation outputs through token-level highlighting. The article discusses the workflow of Prompt Highlighter, quantitative evaluations on VLM benchmarks, reliable descriptions, user studies, attention map visualization, limitations analysis, and future work. It also includes an ablation study and discussions on prediction control by CFG and attention activation.
Workflow of the Prompt Highlighter: The algorithmic workflows for highlighted guidance control and attention activation are outlined.
Quantitative Evaluation:
Evaluations on VLM Benchmarks: Results show competitive performance improvements compared to baseline models.
Reliable Descriptions: Evaluation using CLIP Score demonstrates state-of-the-art performance in image captioning.
User Study: A user study indicates that users find Prompt Highlighter beneficial in achieving task objectives.
Attention Map Visualization: Visualizations confirm the effectiveness of attention activation in steering model focus.
Limitation Analysis: Discussion on computational overhead and model dependency.
Additional Discussions:
Comparison with LLM-CFG and other highlight methods.
Showcases of more visual results and multi-round interactive conversations.
Stats
Compactness is important (234 tokens).
Without tuning on LLaVA-v1.5, the method secured 70.7 in MMBench test and 1552.5 in MME-perception.
Quotes
"Prompt Highlighter facilitates token-level user interactions for customized generation."
"Our approach is compatible with current LLMs and VLMs, achieving impressive customized generation results without training."