Core Concepts
Understanding and mitigating visual hallucinations in Vision-Language Models is crucial for responsible AI advancement.
Abstract
The content delves into the categorization of visual hallucinations in Vision-Language Models (VLMs). It identifies eight categories of hallucinations, creates a dataset for studying VLM hallucinations, and proposes mitigation strategies. The paper discusses the rise of hallucinations in AI models, focusing on VLMs, and emphasizes the importance of comprehensively categorizing VLM hallucinations.
Definition, Quantification, and Prescriptive Remediations
- Authors from various universities
- Contact email provided
- Description of a person outside a building
Contextual Guessing
- Geographical Erratum
- Gender Anomaly
- Wrong Reading
- Identity Incongruity
- Visual Illusion
- VLM as Classifier
- Numeric Discrepancy
- Examples of model misinterpretations
KOSMOS-2
- Image of a surfer mistaken for skateboarding
- Alarming and Mild hallucinations
- Use of VLMs like KOSMOS-2, MiniGPT-v2, Sphinx
Abstract
- Focus on detecting and mitigating hallucination in VLMs
- Dataset creation and mitigation strategies proposed
Visual Hallucination - an extensive categorization
- Explanation of eight categories of visual hallucination
- Concerns about hallucinations eroding trust in technology
- Importance of categorizing VLM hallucinations
Stats
A person in a white shirt and dark pants is standing outside of a building
The Rocky Cliffs and Ocean of the coast of Brittany, France, are a popular destination for tourists
An Image of Sergey Brin, wearing a blue shirt, and a headset, and speaking into a Microphone
A sonogram of a pregnant woman, with a baby in her womb, with the word julian on the screen
There are five people in the image
A collage of pictures of a lion, a giraffe, a bird, a tiger, a monkey, and an elephant
Quotes
"The troubling rise of hallucination presents perhaps the most significant impediment to the advancement of responsible AI." - Authors
"When Google’s Bard AI 'hallucinated' during its initial public demonstration, Alphabet experienced a temporary loss of $100 billion in market value." - Olson, 2023