Core Concepts
Text Bottleneck Models (TBM) provide both global and local interpretability for text classification by automatically discovering a sparse set of salient concepts and using them to make predictions.
Abstract
The paper introduces Text Bottleneck Models (TBM), a framework for interpretable text classification that automatically discovers a set of salient concepts and uses them to make predictions.
The key components of TBM are:
Concept Generation: An iterative process that uses a large language model (LLM) to discover a sparse set of discriminative concepts from the training data. This avoids the need for manual concept curation.
Concept Measurement: The LLM is used to measure the value of each concept for a given input text in a zero-shot manner, providing numerical scores.
Prediction Layer: A white-box linear layer that combines the concept scores to make the final prediction.
The authors evaluate TBM on 12 diverse text understanding datasets, including sentiment analysis, natural language inference, and topic classification. TBM performs competitively with strong black-box baselines like finetuned BERT and GPT-3.5 on sentiment tasks, but lags behind state-of-the-art models.
A detailed human evaluation reveals that the Concept Generation module can consistently generate high-quality concepts, but occasionally struggles with redundancy and leakage. The Concept Measurement module is found to be highly accurate for sentiment analysis, but more challenging for specialized domains like fake news detection.
The interpretable structure of TBM also allows for novel analysis, such as visualizing the learning curves of different concepts and using them to identify potential biases in the data.
Stats
"food for everyone, with 4 generations to feed"
"The ambiance wasn't what I expected"
Quotes
"Black-box deep neural networks excel in text classification, yet their application in high-stakes domains is hindered by their lack of interpretability."
"Concept-based explanations can provide both global and local insights by identifying important concepts across the dataset and localizing how these concepts relate to each individual prediction."