Alapfogalmak
Parallel in-context learning (ParaICL) is a novel method that effectively utilizes all available demonstration examples without exceeding the manageable input context length, enabling robust language model performance across various tasks.
Kivonat
The paper introduces a novel method called parallel in-context learning (ParaICL) to address the limitations of existing in-context learning (ICL) approaches. The key insights are:
- Increasing the number of demonstration examples does not necessarily improve ICL performance consistently, as longer input lengths can lead to suboptimal results in large language models (LLMs).
- Varying combinations of demonstration examples can significantly boost accuracy across different test samples, highlighting the need to leverage all available examples.
To address these challenges, ParaICL organizes the demonstration examples into batches based on their semantic similarity to the test question. It then computes normalized batch semantic scores and applies a weighted average semantic objective, constrained by adaptive plausibility, to select the most appropriate tokens for generation.
The authors conduct extensive experiments across reasoning, natural language inference, and coding tasks to validate the effectiveness of ParaICL. They demonstrate that ParaICL consistently outperforms baseline methods, including standard few-shot, semantically sorted few-shot, and parallel context window approaches. The authors also show that ParaICL can seamlessly integrate with other ICL methods, such as contrastive decoding, further enhancing its performance.
The key contributions of this work are:
- Introduction of parallel in-context learning (ParaICL), a simple but effective method that leverages all available demonstration examples while maintaining manageable input context length.
- Thorough experiments and ablation studies to prove the effectiveness of ParaICL and justify its design.
- Demonstration of how ParaICL can enhance and work in conjunction with other ICL methods.
Statisztikák
Increasing the number of demonstration examples does not consistently improve the performance of Mistral-7B-Instruct-v0.2 on GSM8K and WinoGrande.
Varying combinations of 10-shot demonstration examples can significantly boost the accuracy of Llama-2-7B-Chat on different WinoGrande test samples.
Idézetek
"Existing methods have delved into optimizing the quantity and semantic similarity of these examples to improve ICL performances. However, our preliminary experiments indicate that the effectiveness of ICL is limited by the length of the input context."
"Varying combinations of few-shot demonstration examples can significantly boost accuracy across different test samples, highlighting the need to leverage all available examples."