ESREAL introduces an unsupervised learning framework to mitigate hallucinations in Vision-Language Models by providing fine-grained negative feedback on hallucinated tokens.