The paper introduces NumeroLogic, a simple approach to improving large language models' (LLMs) handling of numerical data and arithmetic operations. The key insight is that the traditional textual representation of numbers, where digits are read from left to right, is suboptimal for LLMs, as they cannot infer the place value of digits until the entire number is processed.
To address this, the authors propose adding the count of digits as a prefix to each number, e.g., "2:42" instead of "42". This NumeroLogic format provides the model with the necessary information about the place value of digits upfront, enhancing its numerical understanding.
The authors demonstrate the effectiveness of NumeroLogic through several experiments:
Supervised training of a small language model (NanoGPT) on arithmetic tasks: NumeroLogic leads to significant accuracy improvements across addition, subtraction, multiplication, sine, and square root operations.
Finetuning a larger model (Llama2-7B) on arithmetic tasks: NumeroLogic again provides substantial gains, especially for non-saturated tasks.
Self-supervised pretraining of Llama2-7B with NumeroLogic: The model exhibits improved performance on the Massive Multitask Language Understanding (MMLU) benchmark, particularly for tasks requiring numerical understanding.
The authors also conduct ablation studies to understand the relative contributions of encoding the operands versus the results, as well as explore alternative encoding formats. The results consistently show the benefits of the NumeroLogic approach, which can be easily integrated into existing language models without architectural changes.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Eli Schwartz... klokken arxiv.org 04-02-2024
https://arxiv.org/pdf/2404.00459.pdfDypere Spørsmål