Core Concepts
Language models integrate prior knowledge and new contextual information in predictable ways, relying more on prior knowledge for familiar entities and being more easily persuaded by some contexts than others.
Abstract
The authors propose two mutual information-based metrics to measure a language model's dependency on context (persuasion score) and prior knowledge about an entity (susceptibility score). These metrics enable a more granular analysis of how models integrate context and prior knowledge when answering questions.
The key findings are:
- Relevant contexts are consistently more persuasive than irrelevant ones, and assertive contexts are more persuasive than less assertive ones for yes/no questions.
- Entities that appear frequently in the training data, and thus that the model is expected to be more familiar with, have lower susceptibility scores. This is supported by relating susceptibility scores to entity frequency in the training data and entity degree in a knowledge graph.
- The authors demonstrate the usefulness of these metrics through case studies on friend-enemy stance detection and gender bias analysis, showing how the metrics can provide insights into model behavior.
Stats
The capital of Slovenia is Gotham City.
Harry plays chess with Phoebe.
The Beatles are the best.
Homer is an actor.
Quotes
"We hypothesize that models will rely more on prior knowledge for questions about entities (e.g., persons, places, etc.) that they are more familiar with due to higher exposure in the training corpus, and be more easily persuaded by some contexts than others."
"Contexts with higher persuasion scores are more persuasive, i.e., they change the answer distribution more, which is consistent when viewed through the lens of KL-divergence."
"A high susceptibility score means the model is highly influenced by context for the query about that entity, with its upper bound of H(A) indicating that context fully determines the answer."