Pre-trained language models' base capabilities are influenced by architecture, with FFN-Wider Transformers reducing the contribution ratio of the combination function, leading to a decline in base capabilities.
Architecture, specifically the FFN-Wider Transformer model, significantly impacts the base capabilities of language models by altering the contribution ratio of combination and transformation functions.