How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Statistikk
Transformers pretrained on diverse tasks exhibit remarkable in-context learning capabilities.
Effective pretraining only requires a small number of independent tasks.