Convergence-Aware Online Model Selection for Large Language Models
The authors propose a time-increasing bandit algorithm, TI-UCB, to balance exploration and exploitation in online model selection. The algorithm effectively predicts the increase of model performances due to training or finetuning and captures converging points of models.