The paper investigates the relationship between isotropy in language model representations and model performance on various downstream tasks. Previous works in NLP have argued that anisotropy (lack of isotropy) in contextualized embeddings is detrimental, as it forces representations to occupy a "narrow cone" in vector space and obscures linguistic information. However, the authors find that in contrast to these claims, decreasing isotropy (making representations more anisotropic) tends to improve performance across three different language models and nine different fine-tuning tasks.
The authors propose a novel regularization method called I-STAR that can effectively shape the geometry of network activations in a stable manner. I-STAR uses IsoScore*, a differentiable and mini-batch stable measure of isotropy, to either increase or decrease the levels of isotropy during training.
The paper also shows that encouraging isotropy in representations increases the intrinsic dimensionality of the data, which is detrimental to performance. This aligns with literature outside of NLP arguing that anisotropy is a natural outcome of stochastic gradient descent and that compressing representations into a lower dimensional manifold is crucial for good performance on downstream tasks.
To Another Language
from source content
arxiv.org
Viktige innsikter hentet fra
by William Rudm... klokken arxiv.org 04-05-2024
https://arxiv.org/pdf/2305.19358.pdfDypere Spørsmål