Analyzing the Impact of Feed-Forward Blocks on Input Contextualization in Transformers
Feed-forward (FF) blocks in Transformer models significantly modify the input contextualization, amplifying specific types of linguistic compositions such as subword-to-word and word-to-multi-word-expression constructions. However, the effects of FF blocks are often canceled out by surrounding residual and normalization layers, suggesting potential redundancy in the Transformer's internal processing.