Core Concepts
Introducing the OPTIN framework for efficient transformer compression without re-training.
Abstract
Introduction of the OPTIN framework for efficient transformer compression without re-training.
Addressing the need for a generalizable model compression framework for transformers.
Demonstrating competitive performance in natural language, image classification, transfer learning, and semantic segmentation tasks.
Comparison with state-of-the-art methods in transformer compression.
Exploration of downstream tasks and architectures benefiting from the OPTIN framework.
Stats
"Particularly, we show a ≤2% accuracy degradation from NLP baselines and a 0.5% improvement from state-of-the-art methods on image classification at competitive FLOPs reductions."
Quotes
"We introduce the One-shot Pruning Technique for Interchangeable Networks (OPTIN) framework as a tool to increase the efficiency of pre-trained transformer architectures."
"Our motivation stems from the need for a generalizable model compression framework that scales well across different transformer architectures and applications."