Transformers for Supervised Online Continual Learning: Exploring In-Context Learning Abilities
מושגי ליבה
Transformers can be leveraged for online continual learning by combining in-context learning and parametric learning, leading to significant improvements in predictive performance.
תקציר
Transformers are widely used for sequence modeling tasks like NLP and audio processing. This study explores their potential for online continual learning, focusing on supervised settings. The proposed method conditions a transformer on recent observations and incorporates replay for multi-epoch training benefits while adhering to the sequential protocol. By combining in-context learning and parametric learning, the approach enables rapid adaptation and sustained progress over long sequences. Empirical investigations demonstrate excellent prediction performance on challenging benchmarks like CLOC.
Transformers for Supervised Online Continual Learning
סטטיסטיקה
Our method achieves between 1014 to 1017 FLOPs to run CLOC.
The model requires a chunk size S of 256 examples with a constant learning rate for the AdamW optimizer.
With pre-extracted features, our approach reaches unprecedented levels of predictive performance.
ציטוטים
"Learning and comprehension correspond to compression." - Chaitin (2002)
"Models with shorter description lengths have a better chance of generalizing to future data." - Wallace (2005)