DeiT-LT: Efficient Training of Vision Transformers on Long-Tailed Datasets
DeiT-LT introduces an efficient distillation scheme to train Vision Transformers from scratch on long-tailed datasets. It leverages out-of-distribution distillation and low-rank feature learning to create specialized experts for majority and minority classes within a single ViT architecture.