Keskeiset käsitteet
Large language models exhibit significant limitations in handling simple linguistic inferences that are trivial for humans, including grammatically-specified entailments, monotonicity entailments, and inferences involving evidential adverbs of uncertainty.
Tiivistelmä
The authors evaluate the language understanding capabilities of large language models (LLMs) on simple inference tasks that most humans find trivial. Specifically, they target:
Grammatically-specified entailments: Replacing a constituent of the premise with an indefinite pronoun.
Premises with evidential adverbs of uncertainty: Adverbs that block the entailment of the rest of the clause.
Monotonicity entailments: Upward (from subsets to supersets) and downward (from supersets to subsets).
The authors design evaluation sets for these tasks and conduct experiments in both zero-shot and chain-of-thought setups, with multiple prompts and LLMs. The results show that the models exhibit moderate to low performance on these evaluation sets.
Further experiments reveal that embedding the premise in syntactic constructions that should preserve the entailment relations (presupposition triggers) or change them (non-factives) further confuses the models, causing them to either under-predict or over-predict certain entailment labels regardless of the true relation, and often disregarding the nature of the embedding context.
Overall, the results suggest that despite LLMs' celebrated language understanding capacity, even the strongest models have blindspots with respect to certain types of entailments, and certain information-packaging structures act as "blinds" overshadowing the semantics of the embedded premise.
Tilastot
Her brother was singing entails Someone was singing.
Fred's tie is very long implies Fred's tie is long, but not vice versa.