The paper studies the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers. It introduces three variants of the RASP programming language as an intermediate representation:
B-RASP, which operates on Boolean values, can express exactly the first-order rational transductions.
B-RASP[pos], which adds positional information, can express all first-order regular transductions and more.
S-RASP, which adds a prefix-sum operation, can express all first-order polyregular transductions and more.
The paper shows that average-hard attention transformers can simulate S-RASP, and as a corollary, that transformer decoders are Turing-complete.
The paper also provides examples of transductions expressible in these RASP variants, including rotate-right, map-reverse, map-duplicate, copy-first-half, marked-square, and majority-rules. It demonstrates that some of these transductions, like majority-rules, are not polyregular.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Lena Strobl,... ב- arxiv.org 04-03-2024
https://arxiv.org/pdf/2404.02040.pdfשאלות מעמיקות