Core Concepts
Proposing the Relation Conversation task to improve relation comprehension in images.
Abstract
The All-Seeing Project V2 introduces the Relation Conversation task to enhance relation comprehension in images. It includes the creation of the AS-V2 dataset, the design of the ASMv2 model, and the evaluation through benchmarks like CRPE. The model excels in various vision-language tasks and scene graph generation.
Stats
ASMv2 achieves an overall accuracy of 52.04 on the relation-aware benchmark CRPE.
ASMv2 surpasses LLaVA-1.5 by a large margin in relation comprehension.
ASMv2 achieves a CIDEr score of 114.7 on the RefCOCOg region captioning benchmark.
Quotes
"Our ASMv2 demonstrates state-of-the-art performance in the OpenSGG task."
"Our model significantly outperforms TextPSG by 8.7 points in recall."