Transformers can express surprisingly large classes of string-to-string transductions, including first-order rational, regular, and polyregular functions, which can be simulated using variants of the RASP programming language.
Temporal counting logic Kt[#] and its equivalent RASP variant C-RASP are the best-known lower bound on the expressivity of future-masked softmax transformer encoders.
This research paper demonstrates that specifically constructed causal Transformers can effectively learn to predict the next token in sequences generated by certain autoregressive functions, particularly linear functions and periodic sequences.