Efficient Multi-Level Training Framework for Accelerating Transformer Models
A multi-level training framework that leverages the fast convergence of smaller models and the high expressiveness of larger models to significantly reduce the computational cost of training transformer-based models like BERT, GPT, and DeiT.