Alapfogalmak
Leveraging language models to provide detailed, understandable explanations for anomaly maps generated by unsupervised anomaly detection methods.
Kivonat
The content presents a framework that integrates language models with unsupervised anomaly detection (UAD) to enhance the interpretability of the generated anomaly maps.
Key highlights:
- UAD methods can identify potential pathological areas by comparing original images with their pseudo-healthy reconstructions, but the clinical interpretation of the anomaly maps is challenging due to a lack of detailed explanations.
- The authors propose a multi-image visual question answering (VQA) framework that combines language models with UAD to provide clinicians with clear, interpretable responses to questions about the anomaly maps.
- The framework incorporates diverse feature fusion strategies to enhance visual knowledge extraction, and the authors introduce a novel Knowledge Q-Former module to assist the model in learning knowledge-related visual features.
- Experiments show that the proposed framework, especially with the Knowledge Q-Former module, significantly outperforms baseline multi-image VQA methods in answering questions about the anomaly detection results.
- The authors also demonstrate that incorporating anomaly maps as inputs can improve the detection of unseen pathologies, highlighting the potential of their approach to support clinical decision-making.
Statisztikák
"Unsupervised anomaly detection enables the identification of potential pathological areas by juxtaposing original images with their pseudo-healthy reconstructions generated by models trained exclusively on normal images."
"Recent advancements in language models have shown the capability of mimicking human-like understanding and providing detailed descriptions."
Idézetek
"To the best of our knowledge, we are the first to leverage a language model for unsupervised anomaly detection, for which we construct a dataset with different questions and answers."
"Besides, integrating anomaly maps as inputs distinctly aids in improving the detection of unseen pathologies."