Multimodal Shannon Game: Exploring the Impact of Visual Context on Next-Word Prediction
The addition of visual information, in various forms, improves both self-reported confidence and accuracy for next-word prediction in both humans and language models.