核心概念
A method to efficiently visualize and systematically analyze the learned representations in convolutional neural networks by linking the classifier's representation space to the latent space of a pre-trained generative adversarial network.
摘要
The authors introduce a method to interpret the learned representations in convolutional neural networks (CNNs) trained for object classification. They propose a "linking network" that maps the penultimate layer of a pre-trained classifier to the latent space of a generative adversarial network (StyleGAN-XL). This allows them to visualize the representations learned by the classifier in a human-interpretable way.
The authors then introduce an automated pipeline to quantify these high-dimensional representations. They use unsupervised tracking methods and few-shot image segmentation to analyze changes in semantic concepts (e.g., color, shape) induced by perturbing individual units in the classifier's representation space.
The authors demonstrate two key applications of their method:
-
Revealing the abstract concepts encoded in individual units of the classifier, showing that some units represent disentangled semantic features while others exhibit superposition of multiple concepts.
-
Examining the classifier's decision boundary by generating counterfactual examples and quantifying the changes in relevant semantic features across the decision boundary.
Overall, the authors present a systematic and objective approach to interpreting the learned representations in CNNs, overcoming the limitations of previous methods that rely on visual inspection or require extensive retraining of the models.
统计
"Convolutional neural networks (CNNs) learn abstract features to perform object classification, but understanding these features remains challenging due to difficult-to-interpret results or high computational costs."
"We propose an automatic method to visualize and systematically analyze learned features in CNNs."
"We introduce a linking network that maps the penultimate layer of a pre-trained classifier to the latent space of a generative model (StyleGAN-XL), thereby enabling an interpretable, human-friendly visualization of the classifier's representations."
"We introduce an automatic pipeline that utilizes such GAN-based visualizations to quantify learned representations by analyzing activation changes in the classifier in the image domain."
引用
"Unraveling the learned concepts that influence a classifier's decisions can reveal inherent biases [20,37] or identify failures in these models [48,65]."
"Studying all potential configurations of representations poses an intricate combinatorial challenge, hence visual inspection soon becomes infeasible and cannot provide a comprehensive and objective understanding of learned features in hidden layers."
"Our method offers systematic and objective perspectives on learned abstract representations in CNNs."