Główne pojęcia
Large language models can leverage query-aware demo generation to enhance their ability to generalize beyond the provided demonstrations and solve out-of-demonstration queries.
Streszczenie
The paper presents a novel prompting method called SELF-DEMOS that aims to elicit the out-of-demonstration (OOD) generalizability in large language models (LLMs). The key idea is to generate query-aware demonstrations that strategically interpolate between existing demonstrations and the given OOD query, transforming the query from OOD to in-demonstration (ID).
The authors first construct a dataset called OOD-Toolset, which features tool-using scenarios with real-world APIs and OOD queries that require different sub-APIs compared to the provided seed demonstrations. They then introduce the SELF-DEMOS workflow, which consists of four steps:
- Query Understanding: The model is prompted to provide a general understanding of the user query, simplifying the complexity of the subsequent analysis.
- Query-aware Demo Generation: The model generates N demos that strategically interpolate between the seed demos and the given query.
- Best-of-N Sampling: The model selects the K best demos from the N generated ones based on criteria like accuracy, relevance, and potential helpfulness.
- Response Generation: The model uses the selected demos along with the seed demos to generate the final response.
Extensive experiments on the OOD-Toolset, GSM8K, and MATH datasets show that SELF-DEMOS outperforms state-of-the-art baselines in solving OOD queries. The authors also conduct ablation studies and analyses to validate the effectiveness and generalization of their approach across different model sizes and task complexities.
Statystyki
The OOD-Toolset dataset contains over 300 real-world APIs and 1,057 instances, each with 3 seed demos and 1 OOD query.
The GSM8K dataset has 1,319 instances, with manually created outliers for the OOD setting.
The MATH dataset has 1,000 instances, with problems from the same level but different subjects for the OOD setting.
Cytaty
"Large language models (LLMs) have shown promising abilities of in-context learning (ICL), adapting swiftly to new tasks with only few-shot demonstrations."
"When faced with out-of-demonstration (OOD1) queries, methods that rely on hand-crafted demos or external retrievers might fail."
"By strategically interpolating, we can derive more relevant and accurate demos from existing ones, which have proven helpful for the final response."