Core Concepts
IR models struggle to follow complex instructions but can improve with training.
Abstract
Abstract:
Large Language Models (LLMs enable diverse user tasks.
IR models using LLMs lack instruction understanding.
Introduction of FOLLOWIR dataset for instruction evaluation.
Introduction:
LLMs show impressive results in understanding instructions.
Existing IR systems rely on keywords despite LLM capabilities.
Adoption of larger IR models without instructions hinders task specificity.
Feature Description:
Different types of information in retrieval queries and instructions.
Standard retrieval queries differ from detailed instructions.
Related Work:
TREC conferences provide annotations for collections.
Instruction-following LLMs popularized by GPT-3.
Building FOLLOWIR:
Evaluation set based on TREC collections with altered instructions.
New pairwise evaluation framework developed for instruction following.
Evaluation Metrics for FOLLOWIR:
Two ways of measuring instruction following - standard metrics and pairwise evaluation.
Evaluating Instruction Following:
Results show that large models or instruction-tuned LLMs perform better at following instructions.
Teaching Instruction Following:
Training set created to teach models to follow longer instructions.
Conclusion:
Most existing IR models struggle to follow instructions effectively.
Limitations:
Evaluation limitations due to document retrieval challenges and licensing restrictions.
Stats
現存の検索モデルは、指示を正しく理解することに苦労しています。
新しいFOLLOWIRモデルは、標準的な情報検索メトリクスと指示に従う能力の両方で改善を示します。