Understanding Language Model Hallucinations in Unfamiliar Scenarios
Large language models tend to default towards a hedged prediction when faced with unfamiliar inputs, leading to plausible but factually incorrect responses. By strategically manipulating the supervision of unfamiliar examples during finetuning, we can control how language models hallucinate.