核心概念
大規模言語モデルの幻覚を評価する新しい方法を提案し、解答不能な数学問題に基づくデータセットUMWPを導入して、LLMの幻覚評価に成功した。
摘要
大規模言語モデル(LLMs)は自然言語処理タスクで効果的だが、曖昧な文脈で信頼性の低い推測を生じる「幻覚」に対して新しい評価方法が提案された。UMWPデータセットは5つのカテゴリーにわたる5200の質問から構成され、テキスト類似性と数式検出を組み合わせた評価手法が導入された。31種類のLLMsによる実験結果では、コンテキスト学習と人間フィードバックトレーニング(RLHF)が幻覚回避能力を向上させることが示された。
统计
Samanta has 8 more points than Mark, and Mark has 50% more points than Eric. How many points do Samanta, Mark, and Eric have in total?
Jack received some emails in the morning, 5 emails in the afternoon, and 8 emails in the evening. How many more emails did Jack receive in the afternoon and evening than in the morning?
How many triangles with a height of 0 inches and a width of 0 inches could fit inside a square with 2-inch sides?
Joshua bought 25 oranges for $12.50. He sells each one for 60c, how much profit in cents will he make on each apple?
Baker made 13 cakes. He sold 91 of them and bought 154 new cakes. How many?
引用
"Large language models (LLMs) are highly effective in various natural language processing (NLP) tasks."
"Utilizing MWP is a reliable and effective approach to assess hallucination."
"We believe that our work provides a feasible way of assessing hallucination in LLMs."