מושגי ליבה
Enhancing scene graph generation by debiasing relation words in vision-language models.
סטטיסטיקה
"Woman carrying towel" score only 0.05 due to the umbrella being closed, while its counterpart with an opened umbrella scores markedly higher.
The comprehensive knowledge of VLMs can help compensate for underrepresented samples.
The LM Estimation method seeks to undermine the proxy relation words’ distribution for SGG within VLMs.
ציטוטים
"Our method effectively addresses the words biases, enhances SGG’s representation, and achieves remarkable performance enhancements."