Core Concepts
Introducing Listenable Maps for Audio Classifiers (L-MAC) to provide faithful and listenable interpretations for audio signals.
Abstract
The article introduces Listenable Maps for Audio Classifiers (L-MAC), a method that generates faithful and listenable interpretations for audio signals. It addresses the challenge of interpreting complex deep learning models in the audio domain. L-MAC utilizes a decoder to generate binary masks highlighting relevant portions of input audio, training with a special loss function to maximize classifier confidence on masked-in portions while minimizing output probability on masked-out portions. The paper details the methodology, experiments, metrics, related work, and user study results showcasing L-MAC's superiority over existing methods.
Introduction
Deep learning models in speech/audio applications.
Explainable Machine Learning importance.
Methodology
Architecture of L-MAC explained.
Masking objective and loss function detailed.
Experiments
Evaluation metrics like faithfulness and understandability.
In-domain and out-of-domain data evaluations.
Qualitative Evaluation
User study comparing L-MAC with existing methods.
Sanity Checks
RemOve And Retrain test results.
Model Randomization Test findings.
Conclusions
Summary of the contributions and results obtained by L-MAC.
Stats
"Quantitative evaluations on both in-domain and out-of-domain data demonstrate that L-MAC consistently produces more faithful interpretations than several gradient and masking-based methodologies."
"Users prefer the interpretations generated by the proposed technique."
Quotes
"Our contributions include proposing a masking-based posthoc interpretation method for audio classifiers capable of providing listenable interpretations."
"L-MAC consistently achieves significantly higher faithfulness scores compared to other methods."