LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Kernekoncepter
LORS introduces a low-rank residual structure to reduce parameters in stacked structures while maintaining or improving model performance.
Resumé
LORS addresses the issue of parameter explosion in deep learning models with stacked structures. By sharing parameters across modules and retaining unique ones, LORS significantly reduces total parameters without compromising performance. Experimental results on object detection tasks demonstrate the effectiveness of LORS in achieving up to 70% reduction in decoder parameters while maintaining or even improving model performance.
LORS
Statistik
GPT-3 utilizes 175 billion parameters and consists of 96 layers of stacked Transformer layers.
AdaMixer's decoders achieved up to a 70% reduction in parameters while maintaining comparable or superior performance.
Citater
"Models with stacked structures often lead to a sharp increase in the number of parameters."
"LORS allows shared and private parameters, reducing overall parameter usage significantly."