The content discusses the importance of understanding intermediate representations in deep learning models and introduces a novel unsupervised method for discovering distributed representations of concepts. The proposed method selects principal neurons to construct an interpretable region known as a Relaxed Decision Region (RDR), which can identify unlabeled subclasses within data and detect causes of misclassifications. By leveraging activation states, instances with similar neuron activation states tend to share coherent concepts, providing deeper insights into the internal mechanisms of deep learning models.
The content also explores various XAI methods developed to enhance model transparency and explains how the proposed method differs by focusing on concept-based explanations without human supervision. It delves into the Configuration Distance metric used to evaluate differences in configurations and demonstrates its effectiveness compared to standard distance metrics like Euclidean and Cosine distances. Additionally, it presents experiments showcasing the coherence of captured concepts, reasoning for misclassified cases, identification of learned concepts across layers, and subclass detection without human supervision.
Overall, the content emphasizes the significance of interpreting deep learning models without human supervision through distributed representations of concepts.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問