Kernekoncepter
The core message of this paper is to propose a novel algorithm for generating meaningful and grammatically correct textual counterfactuals, and an interactive visualization tool called LLM Analyzer to help users understand the behaviors of large language models (LLMs) by analyzing these counterfactuals.
Resumé
The paper starts by identifying three key challenges in applying counterfactual-based methods to analyze and explain LLMs:
- The generated textual counterfactuals should be meaningful and readable to users for mental comparison.
- To make the solution scalable to long-form text, users should be equipped with tools to create batches of counterfactuals from perturbations at various granularity levels and interactively analyze the results.
- Different types of explanations (e.g., counterfactual explanations, feature attributions, anchors) should be connected to offer users a better comprehension of LLMs.
To address these challenges, the paper makes the following contributions:
- A novel algorithm for generating grammatically correct and syntactic-structure-preserved textual counterfactuals by removing and replacing text segments in different granularities.
- LLM Analyzer, an interactive visualization tool to support LLM practitioners and users in understanding LLMs by interactively inspecting and aggregating meaningful counterfactuals.
- Evaluations, including a case study, user studies, and expert interviews that demonstrate the usefulness and usability of LLM Analyzer.
The algorithm first parses the input text to create a hierarchical representation showing the dependency and removability of the words. It then simplifies the parse tree by grouping unremovable segments and generates all valid counterfactuals by sampling combinations of the segments. The algorithm supports user customization on the granularity of the segments and the alternatives for replacement.
LLM Analyzer provides a table-based visualization that allows users to inspect concrete counterfactuals, view additive attributions for each segment, and interactively aggregate counterfactuals by segments of interest to assess their joint influence on model responses. The system also connects different types of explanations, including counterfactual explanations, feature attributions, and anchors, to offer users a comprehensive understanding of the LLM's behaviors.
The evaluations demonstrate the effectiveness of the proposed algorithm in generating meaningful counterfactuals, with an average grammaticality rate of 97.2% across 5,000 samples from five datasets. The user study and expert interviews also confirm the usability and usefulness of LLM Analyzer in supporting the proposed workflow for understanding LLMs.
Statistik
The proposed algorithm generates 46 removal-only counterfactuals on average from each input sentence.
97.2% of the generated counterfactuals are grammatically correct across 5,000 samples from five datasets.
The algorithm takes less than a second to generate counterfactuals for each input sentence.
Citater
"Counterfactual reasoning allows humans to build a causal understanding of the physical world by mentally inferring and comparing consequences from hypothetical scenarios—asking "what-if" questions."
"Counterfactual explanations interpret a model's prediction by finding the minimal perturbation required to change the prediction."
"Example-based explanations (e.g., counterfactual explanations) and aggregation-based explanations (e.g., LIME, Anchor) complement each other and together provide a thorough understanding of the ML model."