Influence of Architecture on Pre-trained Language Models
The author explores how the architecture of FFN-Wider Transformers affects base capabilities, focusing on the contribution ratio of combination functions. The proposed Combination Enhanced Architecture (CEA) aims to reverse the decline in base capabilities.