Masked Autoencoding with Structured Diffusion and Interpretable Transformer-like Architectures
The core message of this work is to uncover a quantitative connection between denoising and compression, and use it to design a conceptual framework for building white-box (mathematically interpretable) transformer-like deep neural networks which can learn using unsupervised pretext tasks, such as masked autoencoding.