Efficient Customization of Large Language Models through Proxy-Tuning
Proxy-tuning is a lightweight decoding-time algorithm that can efficiently customize large pretrained language models without accessing their internal weights, by leveraging small tuned models as "experts" to guide the predictions of the larger base model.