Core Concepts
The authors introduce Component Features (ComFe) as an explainable image classification approach using transformer-decoder heads and hierarchical mixture modeling to improve accuracy and generalization without the need for individual hyperparameter tuning.
Abstract
Interpretable computer vision models are crucial for transparent predictions. ComFe utilizes transformer decoders to identify informative image components, outperforming previous models across various benchmarks. The method is scalable, robust, and efficient, providing insights into model predictions without complex hyperparameter adjustments.
Stats
ComFe obtains higher accuracy compared to previous interpretable models.
Outperforms non-interpretable linear head on various datasets including ImageNet.
Scalable to large image datasets like ImageNet.
Improves performance on generalization and robustness benchmarks.