Core Concepts
The author proposes a Visual-Linguistic Causal Intervention framework to mitigate cross-modal biases in Medical Report Generation, aiming to improve accuracy and reliability significantly.
Abstract
Medical report generation is crucial for aiding diagnosis, but faces challenges due to visual and linguistic biases. The proposed VLCI framework addresses these biases through causal intervention, outperforming existing methods on datasets like IU-Xray and MIMIC-CXR.
Key points:
Challenges in medical report generation due to visual and linguistic biases.
Introduction of the Visual-Linguistic Causal Intervention (VLCI) framework.
Components of VLCI: VDM for visual deconfounding, LDM for linguistic deconfounding.
Use of causal front-door intervention to eliminate spurious correlations.
Comparison with state-of-the-art MRG models on IU-Xray and MIMIC-CXR datasets.
Stats
Occurrence Frequency: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6
BLEU-4 score: 0.165 (IU-Xray), 0.103 (MIMIC-CXR)
Params: R2Gen - 78.07M, R2GenCMN - 58.65M, VLCI - 69.41M
Quotes
"Lightweight models that can mitigate the cross-modal data bias are essential for MRG."
"Causal front-door intervention gives a feasible way to calculate unobservable confounders."
"Our main contributions include proposing VDM and LDM modules based on the Structural Causal Model."