Conceitos Básicos
A semi-supervised method, SemiPL, is proposed to improve the performance of sound source localization in complex visual scenes, especially for datasets with partial labels.
Resumo
The paper presents a semi-supervised method, SemiPL, for event sound source localization in complex visual scenes. The key points are:
The authors apply the existing SSPL (Self-Supervised Predictive Learning) model to the more challenging Chaotic World dataset, which contains complex scenes with human behaviors, voices, and sounds during chaotic events.
The authors explore the impact of parameter adjustments, such as learning rate and batch size, on the performance of the SSPL model. They find that decreasing the learning rate can improve the stability of the training process, but at the cost of slower convergence.
To address the limitations of self-supervised learning on datasets with partial labels, the authors propose a semi-supervised method, SemiPL, which incorporates both supervised and unsupervised losses. SemiPL aims to leverage unlabeled data more effectively to enhance the overall performance and generalizability of the model.
Experiments on the Chaotic World dataset show that SemiPL achieves an improvement of 12.2% cIoU and 0.56% AUC compared to the original SSPL results, demonstrating the effectiveness of the semi-supervised approach in complex visual scenes.
The authors also provide qualitative analysis, highlighting that the SSPL model tends to overlook target objects in complex scenes, while the semi-supervised SemiPL model may be disturbed by the presence of non-human vocalized objects in the dataset.
Estatísticas
The Chaotic World dataset contains a total of 378,093 annotated instances for triangulating the source of sound during chaotic events.
The authors use 456 videos from the dataset, with 384 training videos and 72 test videos.
Citações
"With the increase in data quantity and the influence of label quality, self-supervised learning will be an unstoppable trend in the future."
"For datasets with partial labels, undoubtedly, semi-supervised learning is the best choice and also the inevitable trend for the future development of sound source localization."