Core Concepts
A novel transparent and clinically interpretable AI model that utilizes both chest X-ray images and associated medical reports to accurately detect lung cancer, outperforming baseline deep learning models while providing reliable and clinically relevant explanations.
Abstract
The authors propose a novel transparent and clinically interpretable AI model for detecting lung cancer in chest X-rays. The model is based on a concept bottleneck architecture, which splits the traditional image-to-label classification pipeline into two separate models.
The first model, the concept prediction model, takes a chest X-ray as input and outputs prediction scores for a pre-determined set of clinical concepts extracted from associated medical reports. These concepts were defined under the guidance of a consultant radiologist and represent key features used in manual diagnosis of chest X-rays.
The second model, the label prediction model, then uses the concept prediction scores to classify the image as either cancerous or healthy. The authors experiment with different architectures for the label prediction model, including Decision Trees, SVMs, and MLPs, and find that the Decision Tree model performs the best in terms of precision.
The authors evaluate their approach against post-hoc image-based XAI techniques like LIME and SHAP, as well as the textual XAI tool CXR-LLaVA. They find that their concept-based explanations are more stable, clinically relevant, and reliable than the explanations generated by these existing methods.
The authors also experiment with clustering the original 28 clinical concepts into 6 broader categories, which leads to significant improvements in both concept prediction accuracy (97.1% for top-1 concept) and label prediction performance, outperforming the baseline InceptionV3 model.
Overall, the authors demonstrate the effectiveness of their transparent and clinically interpretable AI approach for lung cancer detection in chest X-rays, providing a promising solution that can build trust and enable better integration of AI systems in healthcare.
Stats
The dataset used in this work consists of 2,374 chest X-rays from the MIMIC-CXR dataset, with an equal number of cancerous and healthy scans.
Quotes
"Our approach yields improved classification performance on lung cancer detection when compared to baseline deep learning models (F1 > 0.9), while also generating clinically relevant and more reliable explanations than existing techniques."
"We evaluate our approach against post-hoc image XAI techniques LIME and SHAP, as well as CXR-LLaVA, a recent textual XAI tool that operates in the context of question answering on chest X-rays."
"On our processed dataset of 2,374 radiological reports, our concept-based explanations boast a 97.1% accuracy in capturing the ground truth with the top-1 highest scoring concept cluster. CXR-LLaVA gave an accuracy of 72.6% on the full dataset, and when considering only cancerous reports this accuracy dropped to 48.3%."