Основні поняття
Faithfulness and plausibility can be complementary objectives in explainable AI, as traditional perturbation-based methods like Shapley value and LIME can achieve high levels of both accuracy and user accessibility in their explanations.
Анотація
The study investigates the relationship between faithfulness and plausibility in explainable AI across three NLP tasks: sentiment analysis, intent detection, and topic labeling. The authors utilize GPT-4 to construct professional explanations as benchmarks for plausibility evaluation.
Key highlights:
- Faithfulness evaluation shows that Shapley value, LIME, and GPT-4 outperform traditional gradient-based and attention-based methods across the datasets.
- Plausibility evaluation based on rank correlation and overlap rate indicates significant overlap between Shapley value, LIME, and GPT-4 in identifying the most influential features.
- The findings suggest that faithfulness and plausibility can be complementary objectives, and explainability algorithms can be optimized to achieve high performance in both dimensions.
The authors also discuss limitations, noting that the study focuses on a selected set of tasks and models, and future research could explore more diverse settings. Additionally, how to optimize explainability algorithms for multiple objectives simultaneously requires further investigation.
Статистика
The model's output probability on the predicted class decreases by 5.9748 on average after replacing the top k influential words in the text sequence for BERT on SST-2.
The model's output probability on the predicted class decreases by 5.4660 on average after replacing the top k influential words in the text sequence for RoBERTa on SST-2.
The model's output probability on the predicted class decreases by 3.2694 on average after replacing the top k influential words in the text sequence for BERT on 20Newsgroups.
The model's output probability on the predicted class decreases by 3.4327 on average after replacing the top k influential words in the text sequence for RoBERTa on 20Newsgroups.
Цитати
"Our findings suggest that plausibility and faithfulness can be complementary. The explainability method could achieve a high overlap rate in identifying influential features and also tend to provide explanations that are plausible to human interpreters, which implies that the explainability algorithms can be optimized toward the dual objective of faithfulness and plausibility."