Transformers learn feature-position correlations in masked image modeling.
Transformers learn feature-position correlations in masked image modeling for self-supervised vision pretraining.