The paper exposes a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is called the Reversal Curse.
For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name.
The authors provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements and showing that they fail to correctly answer the reversed questions. The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation.
The authors also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother?" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter.
The authors hypothesize this ordering effect is due to the Reversal Curse. Models trained on "A is B" (e.g. "Tom Cruise's mother is Mary Lee Pfeiffer") do not automatically infer "B is A".
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы