Theoretical Analysis of Attention Mechanism via Exchangeability and Latent Variable Models
The attention mechanism can be derived from a latent variable model induced by the exchangeability of input tokens, which enables a rigorous characterization of the representation, inference, and learning aspects of attention.