The article introduces the Dual Mean-Teacher (DMT) framework for Audio-Visual Source Localization (AVSL). It addresses challenges in precise localization and confirmation bias in existing methods by utilizing two teacher-student structures. DMT outperforms current methods significantly, leveraging both labeled and unlabeled data effectively. The framework enhances small object localization and generalization capabilities, offering a novel approach to semi-supervised AVSL.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문