Concepts de base
A method to reduce the depth of deep neural networks by iteratively linearizing the lowest-entropy layer while preserving performance.
Résumé
The authors propose a method called EASIER (Entropy-bASed Importance mEtRic) to reduce the depth of over-parameterized deep neural networks. The key idea is to identify layers that are close to becoming linear by estimating the entropy of the rectifier activations in each layer. The layer with the lowest entropy is then replaced with a linear activation, effectively reducing the depth of the network.
The method works as follows:
- Train the neural network on the training set.
- Evaluate the performance on the validation set.
- Calculate the entropy of the rectifier activations for each layer on the training set.
- Replace the activation function of the layer with the lowest entropy with an Identity function (linearization).
- Finetune the model on the training set.
- Evaluate the performance on the validation set.
- Repeat steps 3-6 until the performance drops below a specified threshold.
The authors evaluate EASIER on four popular models (ResNet-18, MobileNetv2, Swin-T, and VGG-16) across seven datasets for image classification. They compare the results to two existing methods: Layer Folding and EGP (an entropy-guided pruning technique).
The results show that EASIER can consistently produce models with better performance for the same number of layers removed, compared to the other methods. It is also able to remove more layers while maintaining similar performance to the original model. The authors also provide an ablation study on the choice of rectifier activation and the feasibility of a one-shot approach.
Stats
The training of large pre-trained models can emit around 200tCO2eq and have an operational carbon footprint of around 550tCO2eq.
GPT-3, a model with 175B parameters, requires enormous resources in terms of hardware capacity and energy consumption.
Citations
"While deep neural networks are highly effective at solving complex tasks, large pre-trained models are commonly employed even to solve consistently simpler downstream tasks, which do not necessarily require a large model's complexity."
"Motivated by the awareness of the ever-growing AI environmental impact, we propose an efficiency strategy that leverages prior knowledge transferred by large models."