核心概念
MiLoRA, a novel parameter-efficient fine-tuning method, improves efficiency and performance by activating only one LoRA module per Transformer layer based on the input prompt, reducing computational overhead during inference.
统计
MiLoRA is 21.7% faster than MOELoRA and 19.7% faster than DoRA in terms of tokens per second (tps) with a beam size of 1.
With a beam size of 3, MiLoRA achieves a 17.9% speed increase over MOELoRA and 13.2% over DoRA.