The paper introduces an attribution-oriented Chain-of-Thought (CoTAR) reasoning method to enhance the accuracy of attributions in the responses generated by large language models (LLMs). The key insights are:
The authors perform rigorous measurements of both the answer quality and citation quality across multiple models and citation levels (span, sentence, and passage).
They show that utilizing CoT reasoning improves the ability of an LLM to produce better quality answers and more precise and faithful citations from the source, demonstrated on multiple models.
The authors demonstrate that by finetuning, smaller models can be competitive with or outperform GPT-4 in some cases in answer and citation quality metrics.
The paper first defines attribution-oriented question answering as a task where the objective is to accurately answer a question while attributing specific portions or the entire answer to the appropriate contextual sources. Three levels of attribution with different granularity levels are identified: span, sentence, and passage.
The authors then propose a multi-step CoT reasoning scheme with varying levels of attribution, hypothesizing that this could encourage the model to generate more accurate answers. The three levels of CoT methods are:
The paper evaluates the performance of GPT-4, a smaller decoder-only model (Mistral 7B), and an encoder-decoder model (Flan-T5 XXL) on two context-enhanced question-answering datasets (QuoteSUM and MS MARCO) using various combinations of citation levels and CoT methods. The results show that the use of the CoTAR reasoning significantly enhances the capacity of the models to generate superior quality answers and more accurate, faithful citations from the source.
To Another Language
from source content
arxiv.org
Ключові висновки, отримані з
by Moshe Bercha... о arxiv.org 04-17-2024
https://arxiv.org/pdf/2404.10513.pdfГлибші Запити