Core Concepts
Language models can learn to solve complex analogies through targeted training objectives, approaching human-level performance on unseen analogy datasets.
Abstract
The paper investigates whether analogical reasoning can be learned by language models, focusing on more complex analogies that are closer to what is used to test human analogical reasoning, rather than the semantic/morphological analogies commonly used in NLP benchmarks.
The authors propose a novel training objective that allows language models to learn analogies by maximizing the cosine similarity between the vector differences of the word pairs in an analogy (a-b and c-d). They test this approach using a BERT-based model and compare the results to several baselines, including a non-fine-tuned BERT model and FastText.
The experiments show that the fine-tuned BERT model with the proposed training objective can learn analogical reasoning, achieving an accuracy of 0.69 on an unseen test set designed to measure human analogical reasoning, which is 0.15 below the human performance. The model performs better on "near" analogies (where the a-b and c-d pairs are semantically similar) compared to "far" analogies.
The authors also find that fine-tuning the model on the analogy task does not deteriorate its performance on external semantic similarity tasks, and in some cases, even improves it. The paper discusses the limitations of the study, such as the small dataset size, and suggests future research directions, including exploring alternative ways to represent the relations between entities in an analogy.
Stats
Entities in false analogies were observed in the pre-training data more frequently than those in true analogies.
Analogies predicted as true contained entities seen 60% more on average than those in analogies predicted as false before training.
Analogies with no out-of-vocabulary (OOV) entities were almost always predicted as true before training.
Quotes
"Language models can learn analogical reasoning, even with a small amount of data."
"After training, the model approaches human performance on an unseen test set constructed for testing human analogical reasoning."
"Fine-tuning the model on the analogy task does not deteriorate its performance on external semantic similarity tasks."