The author introduces Slot Attention with Image Augmentation (SlotAug) to enable interpretable controllability over object representations. The approach incorporates sustainability in controllable slots through Auxiliary Identity Manipulation and Slot Consistency Loss.
The author introduces the Neural Slot Interpreter (NSI) to bridge the gap between unsupervised slot representations and supervised object semantics, enhancing alignment and generative capabilities.
Jointly leveraging high-level semantics and low-level temporal correspondence enhances object-centric perception in videos.
Enhancing object-centric learning through reasoning modules improves perception and prediction abilities in machine learning systems.
Inspired by the reverse hierarchy theory of human vision, RHGNet introduces a top-down pathway to guide bottom-level feature learning with top-level object representations, significantly improving object-centric representation learning, particularly for small objects, and achieving state-of-the-art performance on various datasets.