Core Concepts
AudioProtoPNet is an interpretable deep learning model that can accurately classify bird species from audio recordings by learning and identifying prototypical sound patterns.
Abstract
The paper presents AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) architecture for audio classification that provides inherent interpretability. The model is designed to address the challenges of complex multi-label classification of bird sounds.
Key highlights:
The model uses a ConvNeXt backbone for feature extraction and learns prototypical patterns for each bird species from spectrograms of the training data.
Classification is done by comparing the input spectrogram to these prototypes in latent space, which simultaneously serves as an explanation for the model's decisions.
The model is evaluated on eight different datasets of bird sound recordings and achieves similar performance to state-of-the-art black-box deep learning models, demonstrating the applicability of interpretable models in bioacoustic monitoring.
The interpretability of the model is crucial for its acceptance and practical use by domain experts like ornithologists, as it allows them to understand and verify the model's decision-making process.
Prototype learning enables the identification of previously unknown features in bird calls, which can deepen the understanding of acoustic communication in birds.