Analyzing the Equivalence of In-Context Learning and Gradient Descent in Transformers
The author examines the hypothesis of equivalence between In-Context Learning (ICL) and Gradient Descent (GD) in Transformers, highlighting key limitations and discrepancies in real-world models.