Core Concepts
Transformers can express surprisingly large classes of string-to-string transductions, including first-order rational, regular, and polyregular functions, which can be simulated using variants of the RASP programming language.
Abstract
The paper studies the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers. It introduces three variants of the RASP programming language as an intermediate representation:
B-RASP, which operates on Boolean values, can express exactly the first-order rational transductions.
B-RASP[pos], which adds positional information, can express all first-order regular transductions and more.
S-RASP, which adds a prefix-sum operation, can express all first-order polyregular transductions and more.
The paper shows that average-hard attention transformers can simulate S-RASP, and as a corollary, that transformer decoders are Turing-complete.
The paper also provides examples of transductions expressible in these RASP variants, including rotate-right, map-reverse, map-duplicate, copy-first-half, marked-square, and majority-rules. It demonstrates that some of these transductions, like majority-rules, are not polyregular.
Stats
The paper does not contain any key metrics or important figures to support the author's logic.
Quotes
"We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of transductions."
"Finally, we show that masked average-hard attention transformers can simulate S-RASP."
"A corollary of our results is a new proof that transformer decoders are Turing-complete."