Sparo: Selective Attention for Robust and Compositional Transformer Encodings in Vision
Sparo, a read-out mechanism that partitions transformer encodings into separately-attended slots, imparts an inductive bias for representing a shared compositional world with corresponding concepts across modalities, leading to improved generalization, robustness, and compositionality.