Significant Visual Biases Identified in Audio-Visual Source Localization Benchmarks
Existing audio-visual source localization benchmarks exhibit significant visual biases, where the sounding objects can often be accurately identified using only visual information, diminishing the need for audio input and hindering the effective evaluation of audio-visual models.