Core Concepts
노이즈가 있는 사전 훈련 데이터는 ID 작업에 유익하고 OOD 작업에는 해로운 영향을 미칩니다.
Abstract
Foundation models are pre-trained on large-scale datasets and then fine-tuned for downstream tasks.
Label noise in pre-training datasets can affect model generalization and pose risks.
Pre-training noise shapes the feature space differently, affecting in-domain and out-of-domain performance.
Tuning methods can mitigate the impact of noise and improve generalization.
Noisy Model Learning is a novel research direction focusing on understanding and mitigating noise in pre-training datasets.
Stats
사전 훈련 데이터의 노이즈는 ID 작업에 유익하고 OOD 작업에는 해로운 영향을 미칩니다.
Quotes
"Proper noisy supervision in pre-training can benefit the performance on ID downstream tasks, while more noise results in inferior results."
"The robustness of transferability on OOD downstream tasks constantly deteriorates as the noise increases."