The paper presents a novel XAI approach called CoProNN that generates intuitive explanations for vision models by leveraging text-to-image generation to create task-specific concept prototypes. The key insights are:
CoProNN uses domain experts to define a set of relevant visual concepts that can be used to discriminate between classes in a given task. These concepts are then used as prompts to generate prototype images via a text-to-image model like Stable Diffusion.
The prototype images are mapped into the latent feature space of the vision model, and a k-Nearest Neighbors (kNN) approach is used to retrieve the most relevant concepts for explaining a given prediction. The explanations take the form "This image is class A, because concepts X, Y are present and concepts Z, W are absent."
The modular design of CoProNN allows it to be easily adapted to novel tasks by replacing the classification model and the text-to-image model as more powerful versions become available.
Experiments on coarse-grained (ImageNet animals) and fine-grained (wild bees) classification tasks show that CoProNN outperforms existing concept-based XAI methods like TCAV and IBD in terms of faithfulness of the explanations.
A user study demonstrates that the CoProNN explanations improve human-AI collaboration, helping users identify and correct wrong model predictions.
To Another Language
from source content
arxiv.org
Deeper Inquiries