核心概念
Transformer-based models can be affected by sequence length learning, leading to reliance on non-textual features for classification.
要約
Abstract:
Transformer models may use sequence length as a predictive feature instead of textual information.
Privately owned datasets in fields like medicine and insurance may exhibit this bias.
Introduction:
Transformer models excel in NLP tasks but may rely on unknown correlations.
Bias can lead to shortcuts affecting model performance.
Related Work:
Studies focus on fairness, bias, and spurious features in NLP tasks.
Assessing the Impact of Sequence Length Learning:
Experiments show how models are affected by sequence length learning using various datasets.
Evaluation of the Impact of the Sequence Length Feature:
Models perform well when trained with original data but poorly when trained with altered datasets.
Evaluation of Sequence Length Learning for Partial Class Overlap:
Models heavily rely on sequence length when class distributions do not overlap.
Source of Sequence Length Learning in Transformers Layers:
Transformer encoder layers are significantly affected by sequence length learning.
Sequence Length Learning for Different Transformer Encoder Architectures:
Various transformer architectures exhibit reliance on sequence length imbalance.
Alleviating the Impact of Sequence Length Learning:
Removing problematic observations or augmenting training data can reduce the impact of sequence length learning.
統計
モデルはオリジナルのトレーニングセットで高い精度を達成する。
長さが不均衡なトレーニングセットでモデルは低い精度を示す。
引用
"Models seem to capture sequence length as a classification spurious feature."
"The more the distributions overlap, the lesser the problem."