The paper proposes LLMRefine, an inference-time optimization method to improve the quality of text generated by large language models (LLMs). The key idea is to use a learned fine-grained feedback model to identify defects in the initial LLM output and guide an iterative refinement process.
The framework consists of three main components:
The authors experiment with different local search algorithms, including always accept, greedy uphill, and simulated annealing, to balance exploration of the search space and exploitation of the feedback to find the optimal refined output.
The authors evaluate LLMRefine on three text generation tasks: machine translation, long-form question answering, and topical summarization. They show that LLMRefine consistently outperforms baseline approaches that use coarser feedback, achieving improvements of up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, and 2.2 ROUGE-L on topical summarization. Human evaluation also demonstrates a significant preference for the output of LLMRefine over the baseline outputs.
Vers une autre langue
à partir du contenu source
arxiv.org
Questions plus approfondies