Core Concepts
A method to measure the dependency of neural network classifiers on interpretable features by collapsing the feature dimensions while preserving the data manifold.
Abstract
This paper introduces a new technique to measure the feature dependency of neural network models. The key idea is to "remove" a target feature from the test data by collapsing the dimension corresponding to that feature, while staying on the estimated data manifold. This is achieved by integrating the gradient of the feature function along the latent space of a generative model (VAE) that has learned the data distribution.
The authors test their method on three image datasets: a synthetic dataset of ellipse images, an Alzheimer's disease prediction task using MRI and hippocampus segmentations, and a cell nuclei classification task. The results show that the proposed method can effectively capture the classifier's dependency on interpretable features, such as aspect ratio, volume, and color, by observing the performance drop when those features are collapsed. The method is compared to the CaCE approach, which also aims to measure feature importance, and is shown to provide more precise control over the feature values.
The key steps are:
Train a VAE to learn the data manifold.
For a target feature f, integrate the gradient of f along the VAE latent space to find a point that has the feature collapsed to a baseline value, while staying on the data manifold.
Replace the original test data with the modified data points and measure the performance drop of the classifier, which indicates the dependency on the target feature.
The authors also discuss the limitations of the method, such as the need for a large sample size for VAE training and access to feature gradients.
Stats
The hippocampus volume feature is critical for the Alzheimer's disease classifier, as the accuracy drops from 0.821 to 0.530 when the volume is collapsed.
The size, saturation, and hue features are important for the cell nuclei classifier, with accuracies dropping to around 0.45-0.48 when these features are collapsed.
Quotes
"If a model is dependent on a feature, then removal of that feature should significantly harm its performance."
"We propose to do this by modeling feature collapse as an integral curve of the target feature's gradient vector field in the latent space of a generative model that has learned the data distribution."