Ein neues Diffusionsmodell für die generalisierte audiovisuelle Saliency-Vorhersage (DiffSal) wird vorgestellt, das eine überlegene Leistung im Vergleich zu früheren State-of-the-Art-Methoden aufweist.


coremsg

diffsal-joint-audio-and-video-learning-for-diffusion-saliency-prediction


DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction


title_rewrite


The author introduces DiffSal, a novel diffusion architecture for generalized audio-visual saliency prediction, utilizing input video and audio as conditions. The framework outperforms previous state-of-the-art methods across six challenging benchmarks.