Rostami, P., & Dousti, M. J. (2024). CULL-MT: Compression Using Language and Layer pruning for Machine Translation. arXiv preprint arXiv:2411.06506.
This paper introduces CULL-MT, a novel approach to compress large multilingual machine translation (NMT) models, aiming to reduce computational costs while preserving translation quality for specific language pairs.
CULL-MT employs a greedy structural pruning technique to identify and remove unimportant layers in the model. It iteratively evaluates the impact of removing each layer on the translation performance (measured by spBLEU score) for the selected language directions. After pruning, the model undergoes a healing process using sequence-level knowledge distillation from the original model and fine-tuned with LoRA to recover any performance loss.
CULL-MT effectively compresses large multilingual NMT models while maintaining performance for specific translation directions. The method's success varies depending on the model's architecture and the resource availability of the target language pairs.
This research contributes a practical solution to the growing concern of computational costs associated with large NMT models. By enabling efficient deployment on limited resources, CULL-MT facilitates wider accessibility and application of these models.
The study primarily focuses on models with fewer than 10 billion parameters due to hardware limitations. Further research could explore the effectiveness of CULL-MT on larger models using techniques like quantization. Additionally, investigating the impact of different pruning strategies and fine-tuning methods could further optimize the compression process.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Pedram Rosta... at arxiv.org 11-12-2024
https://arxiv.org/pdf/2411.06506.pdfDeeper Inquiries