The content discusses the challenges in selecting large language models (LLMs) online, introduces the TI-UCB algorithm for efficient model selection, and evaluates its performance in synthetic and real-world environments. The algorithm aims to balance exploration and exploitation while considering increasing-then-converging trends in LLM performances.
Web-based applications like chatbots and search engines are adopting large language models (LLMs), leading to increased attention on online model selection. Traditional methods are becoming impractical due to rising costs of training LLMs. Recent works leverage bandit algorithms for model selection but overlook increasing-then-converging trends in model performances during iterative finetuning.
TI-UCB is proposed as a solution to efficiently predict performance increases and capture converging points of LLMs during online selection. The algorithm achieves a logarithmic regret upper bound in typical increasing bandit settings, demonstrating fast convergence rates. Empirical validation shows the importance of utilizing increasing-then-converging patterns for more efficient model selection in LLM deployment.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Yu Xia,Fang ... om arxiv.org 03-13-2024
https://arxiv.org/pdf/2403.07213.pdfDiepere vragen