Grunnleggende konsepter
Incorporating an explicit objective to encourage compositionality of object representations significantly improves the quality and robustness of object-centric learning.
Sammendrag
The paper proposes a novel framework for object-centric learning that explicitly encourages the compositionality of the learned representations. The key idea is to incorporate an additional "composition path" that constructs composite representations by mixing slots from two different images and evaluates the validity of the composite image using a generative prior. This composition path is trained jointly with the conventional auto-encoding objective, guiding the encoder to learn representations that are not only effective for reconstructing individual images, but also composable.
The paper makes the following key contributions:
- It introduces a novel objective that directly optimizes the compositionality of object representations, in contrast to previous approaches that relied on architectural or algorithmic biases.
- Extensive experiments on four datasets show that the proposed method consistently outperforms strong auto-encoding-based baselines in unsupervised object segmentation tasks.
- The method is also shown to be more robust to various architectural choices, such as the number of slots, encoder architecture, and decoder capacity, compared to the baselines.
The internal analysis further reveals that the proposed composition path effectively encourages the model to learn more holistic and composable object representations, enabling meaningful object-level manipulations in the generated images.
Statistikk
The paper does not provide any specific numerical data or statistics in the main text. The key results are presented in the form of quantitative metrics (FG-ARI, mIoU, mBO) and qualitative visualizations.
Sitater
"Incorporating our objective to the existing framework consistently improves the objective-centric learning and enhances the robustness to the architectural choices."
"Our method consistently outperforms the baselines by a substantial margin."
"Our method produces both semantically meaningful and realistic images from composite slot representations, supporting our claim that we can regularize object-centric learning through the proposed compositional path."