Enhancing Unsupervised Object-Centric Learning with Self-Training and Sequence Permutations
This work introduces SPOT, a novel framework that enhances unsupervised object-centric learning in slot-based autoencoders through self-training and sequence permutations in the transformer decoder. These strategies significantly improve object-specific slot generation and the decoder's ability to utilize the slots during reconstruction, leading to state-of-the-art performance in unsupervised object segmentation, especially for complex real-world images.