Temel Kavramlar
Transformer models exhibit notable resilience against diverse types of label noise during in-context learning, and introducing similar noise into the training set can further enhance such robustness.
Özet
This paper presents a comprehensive study on the robustness of Transformer models' in-context learning (ICL) ability against label noises. The key findings are:
Transformer models exhibit notable resilience against diverse types of label noises, including Gaussian, Uniform, Exponential, Poisson, Multiplicative, and Salt&Pepper distributions, during ICL. They outperform simple baseline methods like least squares and k-nearest neighbors, especially when the number of in-context examples is sufficient.
There exists a distinct noise level threshold for each noise type, beyond which the Transformer model's performance cannot outperform the baselines. The estimated thresholds are provided in the paper.
Introducing similar noises into the training set can enhance the robustness of Transformer models during ICL inference. This holds true across different model sizes, with larger models benefiting more from noisy training.
The paper provides a thorough analysis and understanding of the resilience of Transformer models against label noises during ICL, and offers valuable insights into the research on Transformers in natural language processing.
İstatistikler
The Transformer model exhibits notable resilience against diverse types of label noises during in-context learning.
There exists a distinct noise level threshold for each noise type, beyond which the Transformer model's performance cannot outperform the baselines.
Introducing similar noises into the training set can enhance the robustness of Transformer models during ICL inference.
Alıntılar
"Transformer models exhibit notable resilience against diverse types of label noise during in-context learning, and introducing similar noise into the training set can further enhance such robustness."
"There exists a distinct noise level threshold for each noise type, beyond which the Transformer model's performance cannot outperform the baselines."