Open-Vocabulary Scene Graph Generation with Vision-Language Models
Our framework leverages vision-language pre-trained models to generate scene graphs with both known and novel visual relation concepts, outperforming previous methods on open-vocabulary scene graph generation benchmarks.