Core Concepts
Learning-based methods improve in-context learning in large language models by selecting demonstrations that are similar to the test case in both input and output, potentially capturing the joint distribution of inputs and outputs.
Stats
Top-K BM25 outperforms Top-K BERT on Nl2Bash and SWAG.
Learning-based similarity generally performs well across all tasks.
The similarity between the input and output of positive exemplars and test cases is higher than that of negative exemplars and test cases in the proxy task.
MLSM achieves an average improvement of 1.42% over Top-K BERT on classification tasks and 2.11% over Top-K BM25 on generation tasks.
Supervised methods generally outperform MLSM across all tasks.
TTF surpasses both EPR and CEIL, achieving over 5% absolute improvements on classification tasks.
MLSM generally benefits from a larger batch size, showing over 4% average improvements on classification tasks when the batch size is 8.
TTF consistently outperforms MLSM across different LLMs.
Quotes
"Although learning-based methods consistently exhibit significant performance improvements over task-agnostic similarity across various tasks, the implicit similarity they capture and their connection to the performance of ICL remain unclear."
"Based on these initial observations, we propose two hypotheses regarding learning-based methods: H1: After training, the retriever acts as an ensemble model that adaptively integrates multi-level task-agnostic similarities between the exemplar input (x) and test cases (xt) for different tasks. H2: Beyond input similarities, the training process encourages selecting exemplars with similar output (y) to the output of the test case (yt), implicitly predicted during retrieval, enhancing the retriever’s discriminative power for a specific task."