DiJiang: Efficient Large Language Models through Compact Frequency Domain Kernelization
The core message of this paper is that by leveraging frequency domain transformations and weighted Quasi-Monte Carlo sampling, the authors propose a novel Frequency Domain Kernelization (DiJiang) approach that can efficiently approximate the attention mechanism in Transformer models, leading to significant reductions in training costs and inference time while maintaining comparable performance.