The authors claim that coupling a user-defined learning rate (LR) with gradient-based optimizers is sub-optimal for QAT. Quantized weights transit discrete levels of a quantizer only when corresponding latent weights pass transition points. This suggests that the changes of quantized weights are affected by both the LR for latent weights and their distributions. It is thus difficult to control the degree of changes for quantized weights by scheduling the LR manually.
The authors introduce a TR scheduling technique that controls the number of transitions of quantized weights explicitly. Instead of scheduling a LR for latent weights, they schedule a target TR of quantized weights, and update the latent weights with a novel transition-adaptive LR (TALR), enabling considering the degree of changes for the quantized weights during QAT.
Experimental results demonstrate the effectiveness of the proposed TR scheduling technique on standard benchmarks, including image classification and object detection tasks. The method outperforms conventional optimization methods using manual LR scheduling, especially for aggressive network compression (e.g., low-bit quantization or lightweight models).
Іншою мовою
із вихідного контенту
arxiv.org
Ключові висновки, отримані з
by Junghyup lee... о arxiv.org 05-01-2024
https://arxiv.org/pdf/2404.19248.pdfГлибші Запити