Temel Kavramlar
Enhancing scene graph generation by debiasing relation words in vision-language models.
İstatistikler
"Woman carrying towel" score only 0.05 due to the umbrella being closed, while its counterpart with an opened umbrella scores markedly higher.
The comprehensive knowledge of VLMs can help compensate for underrepresented samples.
The LM Estimation method seeks to undermine the proxy relation words’ distribution for SGG within VLMs.
Alıntılar
"Our method effectively addresses the words biases, enhances SGG’s representation, and achieves remarkable performance enhancements."