Основні поняття
This research paper demonstrates that specifically constructed causal Transformers can effectively learn to predict the next token in sequences generated by certain autoregressive functions, particularly linear functions and periodic sequences.