핵심 개념
Architecture, specifically the FFN-Wider Transformer model, significantly impacts the base capabilities of language models by altering the contribution ratio of combination and transformation functions.
초록
Pre-trained language models excel in various tasks beyond in-distribution language modeling.
FFN-Wider Transformers reduce the contribution ratio of the combination function, leading to a decline in base capabilities.
The Combination Enhanced Architecture (CEA) reverses this decline by adjusting the width ratio of the FFN.
The impact of architecture on base capabilities is crucial and requires further exploration.
통계
"The FFN-Wider BERT models demonstrate a noticeable decline in base capabilities compared to the vanilla BERT models."
"The actual contribution ratio of the MHA layer is a key factor affecting the model’s base capabilities."
인용구
"The FFN-Wider BERT models with our Combination Enhanced Architecture (CEA) successfully reverse the decline in base capabilities."
"As the actual contribution ratio of the MHA layer increases, there is a general synchronous improvement in the model’s base capabilities."