核心概念
Organized Grouped Discrete Representation (OGDR) enhances the guidance of object-centric learning by organizing the intermediate representation channels to better decompose features into attributes, leading to improved performance and expressivity compared to previous grouped discrete representation methods.
摘要
The paper proposes Organized Grouped Discrete Representation (OGDR) as a general augmentor for object-centric learning (OCL) methods, including both transformer-based and diffusion-based approaches.
Key highlights:
- OGDR organizes the intermediate representation channels to group channels belonging to the same attributes together, overcoming the information loss and model expressivity issues of the previous naive grouped discrete representation (GDR) method.
- OGDR is applicable to both transformer-based OCL methods like SLATE and STEVE, as well as diffusion-based state-of-the-art methods like SlotDiffusion.
- Comprehensive experiments demonstrate that OGDR significantly boosts the performance of these OCL methods on various datasets, outperforming the competitive GDR augmentor.
- Analyses show that OGDR preserves more information and enhances the object discriminability in the discrete representation, leading to better guidance for object representation learning.
- Ablation studies provide insights on how to configure the OGDR hyperparameters to maximize its effectiveness.
统计
OGDR improves the unsupervised segmentation performance of transformer-based SLATE and STEVE models, as well as diffusion-based SlotDiffusion model, across multiple datasets including ClevrTex, COCO, VOC, and MOVi.
OGDR also boosts the performance of these OCL models when using the strong DINO foundation model as the primary encoder.
引用
"Our organizing technique promotes the VAE model to grasp more diverse template features for better representation discretization."
"Our organizing technique fosters better guiding representation for object representation learning."