The author argues that by using prediction as the main learning objective, a novel network architecture called OPPLE can simultaneously learn object segmentation, depth perception, and 3D object localization without supervision. This approach is inspired by how infants develop perceptual abilities.
Objects are learned through prediction, mimicking human infant abilities.
A simple, scalable, and non-iterative method called SAMP (Simplified Slot Attention with Max Pool Priors) that learns object-centric representations from images by inducing competition and specialization among slots.